Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[DOC][v2.0] Part1: Link Check #20487

Merged
merged 8 commits into from
Aug 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/python_docs/python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ html: $(OBJ)
cp -n -r ../_static build/ || true;
sphinx-autogen build/api/*.rst build/api/**/*.rst -t build/_templates/;
# make -C build linkcheck doctest html
make -C build html;
make -C build linkcheck html;
sed -i.bak 's/33\,150\,243/23\,141\,201/g' build/_build/html/_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css;
make update_github_link

Expand Down
2 changes: 1 addition & 1 deletion docs/python_docs/python/tutorials/deploy/export/onnx.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ In this tutorial, we will learn how to use MXNet to ONNX exporter on pre-trained

To run the tutorial you will need to have installed the following python modules:
- [MXNet >= 1.3.0](https://mxnet.apache.org/get_started)
- [onnx]( https://github.com/onnx/onnx#installation) v1.2.1 (follow the install guide)
- [onnx]( https://github.com/onnx/onnx#user-content-installation) v1.2.1 (follow the install guide)

*Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Gluon provides State of the Art models for many of the standard tasks such as Cl

To complete this tutorial, you need:

- [Build MXNet from source](https://mxnet.apache.org/get_started/ubuntu_setup#build-mxnet-from-source) with Python(Gluon) and C++ Packages
- [Build MXNet from source](https://mxnet.apache.org/get_started/build_from_source) with Python(Gluon) and C++ Packages
- Learn the basics about Gluon with [A 60-minute Gluon Crash Course](https://gluon-crash-course.mxnet.io/)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ Output:

As a rule of thumb, one should always implement custom layers by inheriting from `HybridBlock`. This allows to have more flexibility, and doesn't affect execution speed once hybridization is done.

Unfortunately, at the moment of writing this tutorial, NLP related layers such as [RNN](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.RNN), [GRU](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.GRU) and [LSTM](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.LSTM) are directly inhereting from the `Block` class via common `_RNNLayer` class. That means that networks with such layers cannot be hybridized. But this might change in the future, so stay tuned.
Unfortunately, at the moment of writing this tutorial, NLP related layers such as [RNN](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.RNN), [GRU](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.GRU) and [LSTM](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.LSTM) are directly inhereting from the `Block` class via common `_RNNLayer` class. That means that networks with such layers cannot be hybridized. But this might change in the future, so stay tuned.

It is important to notice that hybridization has nothing to do with computation on GPU. One can train both hybridized and non-hybridized networks on both CPU and GPU, though hybridized networks would work faster. Though, it is hard to say in advance how much faster it is going to be.

## Adding a custom layer to a network

While it is possible, custom layers are rarely used separately. Most often they are used with predefined layers to create a neural network. Output of one layer is used as an input of another layer.

Depending on which class you used as a base one, you can use either [Sequential](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.Sequential) or [HybridSequential](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.HybridSequential) container to form a sequential neural network. By adding layers one by one, one adds dependencies of one layer's input from another layer's output. It is worth noting, that both `Sequential` and `HybridSequential` containers inherit from `Block` and `HybridBlock` respectively.
Depending on which class you used as a base one, you can use either [Sequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or [HybridSequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.HybridSequential) container to form a sequential neural network. By adding layers one by one, one adds dependencies of one layer's input from another layer's output. It is worth noting, that both `Sequential` and `HybridSequential` containers inherit from `Block` and `HybridBlock` respectively.

Below is an example of how to create a simple neural network with a custom layer. In this example, `NormalizationHybridLayer` gets as an input the output from `Dense(5)` layer and pass its output as an input to `Dense(1)` layer.

Expand Down Expand Up @@ -133,7 +133,7 @@ Output:

## Parameters of a custom layer

Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class inside of Apache MXNet neural network.
Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](../../../../api/gluon/parameter.rst#gluon-parameter) class inside of Apache MXNet neural network.


```{.python .input}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -246,11 +246,6 @@ net.export("lenet", epoch=1)

## Loading model parameters AND architecture from file

### From a different frontend

One of the main reasons to serialize model architecture into a JSON file is to load it from a different frontend like C, C++ or Scala. Here is a couple of examples:
1. [Loading serialized Hybrid networks from C](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/predict-cpp/image-classification-predict.cc)
2. [Loading serialized Hybrid networks from Scala](https://github.com/apache/incubator-mxnet/blob/master/scala-package/infer/src/main/scala/org/apache/mxnet/infer/ImageClassifier.scala)

### From Python

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ The network would learn to minimize the distance between the two `A`'s and maxim

#### [CTC Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.CTCLoss)

CTC Loss is the [connectionist temporal classification loss](https://distill.pub/2017/ctc/) . It is used to train recurrent neural networks with variable time dimension. It learns the alignment and labelling of input sequences. It takes a sequence as input and gives probabilities for each timestep. For instance, in the following image the word is not well aligned with the 5 timesteps because of the different sizes of characters. CTC Loss finds for each timestep the highest probability e.g. `t1` presents with high probability a `C`. It combines the highest probapilities and returns the best path decoding. For an in-depth tutorial on how to use CTC-Loss in MXNet, check out this [example](https://github.com/apache/incubator-mxnet/tree/master/example/ctc).
CTC Loss is the [connectionist temporal classification loss](https://distill.pub/2017/ctc/) . It is used to train recurrent neural networks with variable time dimension. It learns the alignment and labelling of input sequences. It takes a sequence as input and gives probabilities for each timestep. For instance, in the following image the word is not well aligned with the 5 timesteps because of the different sizes of characters. CTC Loss finds for each timestep the highest probability e.g. `t1` presents with high probability a `C`. It combines the highest probapilities and returns the best path decoding.

![ctc_loss](/_static/ctc_loss.png)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,6 @@ Summary
In this notebook, we have shown how to train a GNMT model on IWSLT 2015
English-Vietnamese using Gluon NLP toolkit. The complete training script
can be found
`here <https://github.com/dmlc/gluon-nlp/blob/master/scripts/nmt/train_gnmt.py>`__.
`here <https://github.com/dmlc/gluon-nlp/blob/v0.x/scripts/machine_translation/train_gnmt.py>`__.
The command to reproduce the result can be seen in the `nmt scripts
page <http://gluon-nlp.mxnet.io/scripts/index.html#machine-translation>`__.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ In this tutorial, you will learn how to use the [Gluon Fit API](https://cwiki.ap

With the Fit API, you can train a deep learning model with a minimal amount of code. Just specify the network, loss function and the data you want to train on. You don't need to worry about the boiler plate code to loop through the dataset in batches (often called as 'training loop'). Advanced users can train with bespoke training loops, and many of these use cases will be covered by the Fit API.

To demonstrate the Fit API, you will train an image classification model using the [ResNet-18](https://arxiv.org/abs/1512.03385) neural network architecture. The model will be trained using the [Fashion-MNIST dataset](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/).
To demonstrate the Fit API, you will train an image classification model using the [ResNet-18](https://arxiv.org/abs/1512.03385) neural network architecture. The model will be trained using the [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).

## Prerequisites

Expand All @@ -44,7 +44,7 @@ ctx = [mx.gpu(i) for i in range(gpu_count)] if gpu_count > 0 else mx.cpu()

## Dataset

[Fashion-MNIST](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/) dataset consists of fashion items divided into ten categories: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot.
[Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset consists of fashion items divided into ten categories: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot.

- It has 60,000 grayscale images of size 28 * 28 for training.
- It has 10,000 grayscale images of size 28 * 28 for testing/validation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Normalization Blocks

When training deep neural networks there are a number of techniques that are thought to be essential for model convergence. One important area is deciding how to initialize the parameters of the network. Using techniques such as [Xavier](https://mxnet.apache.org/api/python/optimization/optimization.html#mxnet.initializer.Xavier) initialization, we can can improve the gradient flow through the network at the start of training. Another important technique is normalization: i.e. scaling and shifting certain values towards a distribution with a mean of 0 (i.e. zero-centered) and a standard distribution of 1 (i.e. unit variance). Which values you normalize depends on the exact method used as we'll see later on.
When training deep neural networks there are a number of techniques that are thought to be essential for model convergence. One important area is deciding how to initialize the parameters of the network. Using techniques such as [Xavier](../../../../../api/initializer/index.rst#mxnet.initializer.Xavier) initialization, we can can improve the gradient flow through the network at the start of training. Another important technique is normalization: i.e. scaling and shifting certain values towards a distribution with a mean of 0 (i.e. zero-centered) and a standard distribution of 1 (i.e. unit variance). Which values you normalize depends on the exact method used as we'll see later on.

<p align="center">
<img src="./imgs/data_normalization.jpeg" alt="drawing" width="500"/>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ print('Time for converting to numpy: %f sec' % (time() - start))

From the timings above, it seems as if converting to numpy takes lot more time than multiplying two large matrices. That doesn't seem right.

This is because, in MXNet, all operations are executed asynchronously. So, when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has only been queued for execution. However, [asnumpy](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=asnumpy#mxnet.ndarray.NDArray.asnumpy) has to wait for the result to be calculated in order to convert it to numpy array on CPU, hence taking a longer time. Other examples of 'blocking' operations include [asscalar](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=asscalar#mxnet.ndarray.NDArray.asscalar) and [wait_to_read](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=wait_to_read#mxnet.ndarray.NDArray.wait_to_read).
This is because, in MXNet, all operations are executed asynchronously. So, when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has only been queued for execution. However, [asnumpy](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asnumpy) has to wait for the result to be calculated in order to convert it to numpy array on CPU, hence taking a longer time. Other examples of 'blocking' operations include [asscalar](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asscalar) and [wait_to_read](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.wait_to_read).

While it is possible to use [NDArray.waitall()](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=waitall#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](https://mxnet.apache.org/api/python/gluon/gluon.html?highlight=sequential#mxnet.gluon.nn.Sequential) or hybridized network.
While it is possible to use [NDArray.waitall()](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or hybridized network.

## The correct way to profile

Expand Down
6 changes: 6 additions & 0 deletions python/mxnet/numpy/fallback.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,12 @@ def wrapper(*args, **kwargs):
# remove unused reference
new_fn_doc = new_fn_doc.replace(
'.. [1] Wikipedia page: https://en.wikipedia.org/wiki/Trapezoidal_rule', '')
elif obj_name == "i0":
# replace broken link
new_fn_doc = new_fn_doc.replace(
'.. [3] http://kobesearch.cpan.org/htdocs/Math-Cephes/Math/Cephes.html',
'.. [3] https://metacpan.org/pod/distribution/Math-Cephes/lib/Math/Cephes.pod \
#i0:-Modified-Bessel-function-of-order-zero')
setattr(fallback_mod, obj_name, get_func(onp_obj, new_fn_doc))
else:
setattr(fallback_mod, obj_name, onp_obj)
Expand Down
2 changes: 1 addition & 1 deletion python/mxnet/onnx/mx2onnx/_export_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def export_model(sym, params, in_shapes=None, in_types=np.float32,
"""Exports the MXNet model file, passed as a parameter, into ONNX model.
Accepts both symbol,parameter objects as well as json and params filepaths as input.
Operator support and coverage -
https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#operator-support-matrix
https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#user-content-operator-support-matrix

Parameters
----------
Expand Down
6 changes: 1 addition & 5 deletions src/io/iter_mnist.cc
Original file line number Diff line number Diff line change
Expand Up @@ -258,11 +258,7 @@ class MNISTIter: public IIterator<TBlobBatch> {
DMLC_REGISTER_PARAMETER(MNISTParam);

MXNET_REGISTER_IO_ITER(MNISTIter)
.describe(R"code(Iterating on the MNIST dataset.

One can download the dataset from http://yann.lecun.com/exdb/mnist/

)code" ADD_FILELINE)
.describe("Iterating on the MNIST dataset." ADD_FILELINE)
.add_arguments(MNISTParam::__FIELDS__())
.add_arguments(PrefetcherParam::__FIELDS__())
.set_body([]() {
Expand Down
2 changes: 1 addition & 1 deletion src/operator/svm_output.cc
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ MXNET_REGISTER_OP_PROPERTY(SVMOutput, SVMOutputProp)
.describe(R"code(Computes support vector machine based transformation of the input.

This tutorial demonstrates using SVM as output layer for classification instead of softmax:
https://github.com/dmlc/mxnet/tree/master/example/svm_mnist.
https://github.com/dmlc/mxnet/tree/v1.x/example/svm_mnist.

)code")
.add_argument("data", "NDArray-or-Symbol", "Input data for SVM transformation.")
Expand Down
4 changes: 2 additions & 2 deletions src/operator/tensor/matrix_op.cc
Original file line number Diff line number Diff line change
Expand Up @@ -991,7 +991,7 @@ NNVM_REGISTER_OP(_backward_squeeze)
NNVM_REGISTER_OP(depth_to_space)
.describe(R"code(Rearranges(permutes) data from depth into blocks of spatial data.
Similar to ONNX DepthToSpace operator:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#DepthToSpace.
https://github.com/onnx/onnx/blob/master/docs/Operators.md#user-content-depthtospace.
The output is a new tensor where the values from depth dimension are moved in spatial blocks
to height and width dimension. The reverse of this operation is ``space_to_depth``.

Expand Down Expand Up @@ -1043,7 +1043,7 @@ Example::
NNVM_REGISTER_OP(space_to_depth)
.describe(R"code(Rearranges(permutes) blocks of spatial data into depth.
Similar to ONNX SpaceToDepth operator:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#SpaceToDepth
https://github.com/onnx/onnx/blob/master/docs/Operators.md#user-content-spacetodepth
The output is a new tensor where the values from height and width dimension are
moved to the depth dimension. The reverse of this operation is ``depth_to_space``.
.. math::
Expand Down