From 98f4382882cfeafc19ca90ddcb28ad4cdd31a536 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Tue, 3 Aug 2021 12:27:55 -0700
Subject: [PATCH 1/7] fix broken links

---
 .../getting-started/gluon_from_experiment_to_deployment.md  | 2 +-
 .../python/tutorials/packages/gluon/blocks/custom-layer.md  | 6 +++---
 .../tutorials/packages/gluon/blocks/save_load_params.md     | 5 -----
 .../python/tutorials/packages/gluon/loss/loss.md            | 2 +-
 .../python/tutorials/packages/gluon/text/gnmt.rst           | 2 +-
 .../tutorials/packages/gluon/training/fit_api_tutorial.md   | 4 ++--
 .../packages/gluon/training/normalization/index.md          | 2 +-
 .../python/tutorials/performance/backend/profiler.md        | 4 ++--
 src/operator/svm_output.cc                                  | 2 +-
 src/operator/tensor/matrix_op.cc                            | 4 ++--
 10 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/docs/python_docs/python/tutorials/getting-started/gluon_from_experiment_to_deployment.md b/docs/python_docs/python/tutorials/getting-started/gluon_from_experiment_to_deployment.md
index 8d583eb17290..2398d700bad0 100644
--- a/docs/python_docs/python/tutorials/getting-started/gluon_from_experiment_to_deployment.md
+++ b/docs/python_docs/python/tutorials/getting-started/gluon_from_experiment_to_deployment.md
@@ -33,7 +33,7 @@ Gluon provides State of the Art models for many of the standard tasks such as Cl
 
 To complete this tutorial, you need:
 
-- [Build MXNet from source](https://mxnet.apache.org/get_started/ubuntu_setup#build-mxnet-from-source) with Python(Gluon) and C++ Packages
+- [Build MXNet from source](https://mxnet.apache.org/get_started/build_from_source) with Python(Gluon) and C++ Packages
 - Learn the basics about Gluon with [A 60-minute Gluon Crash Course](https://gluon-crash-course.mxnet.io/)
 
 
diff --git a/docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md b/docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md
index 4f0de0df200c..54fbd7974a76 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md
+++ b/docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md
@@ -93,7 +93,7 @@ Output:
 
 As a rule of thumb, one should always implement custom layers by inheriting from `HybridBlock`. This allows to have more flexibility, and doesn't affect execution speed once hybridization is done. 
 
-Unfortunately, at the moment of writing this tutorial, NLP related layers such as [RNN](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.RNN), [GRU](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.GRU) and [LSTM](https://mxnet.apache.org/api/python/gluon/rnn.html#mxnet.gluon.rnn.LSTM) are directly inhereting from the `Block` class via common `_RNNLayer` class. That means that networks with such layers cannot be hybridized. But this might change in the future, so stay tuned.
+Unfortunately, at the moment of writing this tutorial, NLP related layers such as [RNN](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.RNN), [GRU](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.GRU) and [LSTM](../../../../api/gluon/rnn/index.rst#mxnet.gluon.rnn.LSTM) are directly inhereting from the `Block` class via common `_RNNLayer` class. That means that networks with such layers cannot be hybridized. But this might change in the future, so stay tuned.
 
 It is important to notice that hybridization has nothing to do with computation on GPU. One can train both hybridized and non-hybridized networks on both CPU and GPU, though hybridized networks would work faster. Though, it is hard to say in advance how much faster it is going to be.
 
@@ -101,7 +101,7 @@ It is important to notice that hybridization has nothing to do with computation
 
 While it is possible, custom layers are rarely used separately. Most often they are used with predefined layers to create a neural network. Output of one layer is used as an input of another layer.
 
-Depending on which class you used as a base one, you can use either [Sequential](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.Sequential) or [HybridSequential](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.nn.HybridSequential) container to form a sequential neural network. By adding layers one by one, one adds dependencies of one layer's input from another layer's output. It is worth noting, that both `Sequential` and `HybridSequential` containers inherit from `Block` and `HybridBlock` respectively. 
+Depending on which class you used as a base one, you can use either [Sequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or [HybridSequential](../../../../api/gluon/nn/index.rst#mxnet.gluon.nn.HybridSequential) container to form a sequential neural network. By adding layers one by one, one adds dependencies of one layer's input from another layer's output. It is worth noting, that both `Sequential` and `HybridSequential` containers inherit from `Block` and `HybridBlock` respectively. 
 
 Below is an example of how to create a simple neural network with a custom layer. In this example, `NormalizationHybridLayer` gets as an input the output from `Dense(5)` layer and pass its output as an input to `Dense(1)` layer.
 
@@ -133,7 +133,7 @@ Output:
 
 ## Parameters of a custom layer
 
-Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class inside of Apache MXNet neural network.
+Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](../../../../api/gluon/parameter.rst#gluon-parameter) class inside of Apache MXNet neural network.
 
 
 ```{.python .input}
diff --git a/docs/python_docs/python/tutorials/packages/gluon/blocks/save_load_params.md b/docs/python_docs/python/tutorials/packages/gluon/blocks/save_load_params.md
index 11a0b5da1b9f..4aea789b6dbb 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/blocks/save_load_params.md
+++ b/docs/python_docs/python/tutorials/packages/gluon/blocks/save_load_params.md
@@ -246,11 +246,6 @@ net.export("lenet", epoch=1)
 
 ## Loading model parameters AND architecture from file
 
-### From a different frontend
-
-One of the main reasons to serialize model architecture into a JSON file is to load it from a different frontend like C, C++ or Scala. Here is a couple of examples:
-1. [Loading serialized Hybrid networks from C](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/predict-cpp/image-classification-predict.cc)
-2. [Loading serialized Hybrid networks from Scala](https://github.com/apache/incubator-mxnet/blob/master/scala-package/infer/src/main/scala/org/apache/mxnet/infer/ImageClassifier.scala)
 
 ### From Python
 
diff --git a/docs/python_docs/python/tutorials/packages/gluon/loss/loss.md b/docs/python_docs/python/tutorials/packages/gluon/loss/loss.md
index a94c5c0cccfe..1cdf796decb6 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/loss/loss.md
+++ b/docs/python_docs/python/tutorials/packages/gluon/loss/loss.md
@@ -236,7 +236,7 @@ The network would learn to minimize the distance between the two `A`'s and maxim
 
 #### [CTC Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.CTCLoss)
 
-CTC Loss is the [connectionist temporal classification loss](https://distill.pub/2017/ctc/) . It is used to train recurrent neural networks with variable time dimension. It learns the alignment and labelling of input sequences. It takes a sequence as input and gives probabilities for each timestep. For instance, in the following image the word is not well aligned with the 5 timesteps because of the different sizes of characters. CTC Loss finds for each timestep the highest probability e.g. `t1` presents with high probability a `C`. It combines the highest probapilities and returns the best path decoding. For an in-depth tutorial on how to use CTC-Loss in MXNet, check out this [example](https://github.com/apache/incubator-mxnet/tree/master/example/ctc).
+CTC Loss is the [connectionist temporal classification loss](https://distill.pub/2017/ctc/) . It is used to train recurrent neural networks with variable time dimension. It learns the alignment and labelling of input sequences. It takes a sequence as input and gives probabilities for each timestep. For instance, in the following image the word is not well aligned with the 5 timesteps because of the different sizes of characters. CTC Loss finds for each timestep the highest probability e.g. `t1` presents with high probability a `C`. It combines the highest probapilities and returns the best path decoding.
 
 ![ctc_loss](/_static/ctc_loss.png)
 
diff --git a/docs/python_docs/python/tutorials/packages/gluon/text/gnmt.rst b/docs/python_docs/python/tutorials/packages/gluon/text/gnmt.rst
index 1fc683b9b300..d3ac96f8a2cf 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/text/gnmt.rst
+++ b/docs/python_docs/python/tutorials/packages/gluon/text/gnmt.rst
@@ -483,6 +483,6 @@ Summary
 In this notebook, we have shown how to train a GNMT model on IWSLT 2015
 English-Vietnamese using Gluon NLP toolkit. The complete training script
 can be found
-`here <https://github.com/dmlc/gluon-nlp/blob/master/scripts/nmt/train_gnmt.py>`__.
+`here <https://github.com/dmlc/gluon-nlp/blob/v0.x/scripts/machine_translation/train_gnmt.py>`__.
 The command to reproduce the result can be seen in the `nmt scripts
 page <http://gluon-nlp.mxnet.io/scripts/index.html#machine-translation>`__.
diff --git a/docs/python_docs/python/tutorials/packages/gluon/training/fit_api_tutorial.md b/docs/python_docs/python/tutorials/packages/gluon/training/fit_api_tutorial.md
index 06592aaf9b81..5dc052390c43 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/training/fit_api_tutorial.md
+++ b/docs/python_docs/python/tutorials/packages/gluon/training/fit_api_tutorial.md
@@ -21,7 +21,7 @@ In this tutorial, you will learn how to use the [Gluon Fit API](https://cwiki.ap
 
 With the Fit API, you can train a deep learning model with a minimal amount of code. Just specify the network, loss function and the data you want to train on. You don't need to worry about the boiler plate code to loop through the dataset in batches (often called as 'training loop'). Advanced users can train with bespoke training loops, and many of these use cases will be covered by the Fit API.
 
-To demonstrate the Fit API, you will train an image classification model using the [ResNet-18](https://arxiv.org/abs/1512.03385) neural network architecture. The model will be trained using the [Fashion-MNIST dataset](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/).
+To demonstrate the Fit API, you will train an image classification model using the [ResNet-18](https://arxiv.org/abs/1512.03385) neural network architecture. The model will be trained using the [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).
 
 ## Prerequisites
 
@@ -44,7 +44,7 @@ ctx = [mx.gpu(i) for i in range(gpu_count)] if gpu_count > 0 else mx.cpu()
 
 ## Dataset
 
-[Fashion-MNIST](https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/) dataset consists of fashion items divided into ten categories: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot.
+[Fashion-MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset consists of fashion items divided into ten categories: t-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot.
 
 - It has 60,000 grayscale images of size 28 * 28 for training.
 - It has 10,000 grayscale images of size 28 * 28 for testing/validation.
diff --git a/docs/python_docs/python/tutorials/packages/gluon/training/normalization/index.md b/docs/python_docs/python/tutorials/packages/gluon/training/normalization/index.md
index 8c18b6290bf6..4b5fdc4fc4b2 100644
--- a/docs/python_docs/python/tutorials/packages/gluon/training/normalization/index.md
+++ b/docs/python_docs/python/tutorials/packages/gluon/training/normalization/index.md
@@ -17,7 +17,7 @@
 
 # Normalization Blocks
 
-When training deep neural networks there are a number of techniques that are thought to be essential for model convergence. One important area is deciding how to initialize the parameters of the network. Using techniques such as [Xavier](https://mxnet.apache.org/api/python/optimization/optimization.html#mxnet.initializer.Xavier) initialization, we can can improve the gradient flow through the network at the start of training. Another important technique is normalization: i.e. scaling and shifting certain values towards a distribution with a mean of 0 (i.e. zero-centered) and a standard distribution of 1 (i.e. unit variance). Which values you normalize depends on the exact method used as we'll see later on.
+When training deep neural networks there are a number of techniques that are thought to be essential for model convergence. One important area is deciding how to initialize the parameters of the network. Using techniques such as [Xavier](../../../../../api/initializer/index.rst#mxnet.initializer.Xavier) initialization, we can can improve the gradient flow through the network at the start of training. Another important technique is normalization: i.e. scaling and shifting certain values towards a distribution with a mean of 0 (i.e. zero-centered) and a standard distribution of 1 (i.e. unit variance). Which values you normalize depends on the exact method used as we'll see later on.
 
 <p align="center">
     <img src="./imgs/data_normalization.jpeg" alt="drawing" width="500"/>
diff --git a/docs/python_docs/python/tutorials/performance/backend/profiler.md b/docs/python_docs/python/tutorials/performance/backend/profiler.md
index 5585ccd64f2f..b241feb40637 100644
--- a/docs/python_docs/python/tutorials/performance/backend/profiler.md
+++ b/docs/python_docs/python/tutorials/performance/backend/profiler.md
@@ -44,9 +44,9 @@ print('Time for converting to numpy: %f sec' % (time() - start))
 
 From the timings above, it seems as if converting to numpy takes lot more time than multiplying two large matrices. That doesn't seem right.
 
-This is because, in MXNet, all operations are executed asynchronously. So, when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has only been queued for execution. However, [asnumpy](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=asnumpy#mxnet.ndarray.NDArray.asnumpy) has to wait for the result to be calculated in order to convert it to numpy array on CPU, hence taking a longer time. Other examples of 'blocking' operations include [asscalar](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=asscalar#mxnet.ndarray.NDArray.asscalar) and [wait_to_read](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=wait_to_read#mxnet.ndarray.NDArray.wait_to_read).
+This is because, in MXNet, all operations are executed asynchronously. So, when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has only been queued for execution. However, [asnumpy](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asnumpy) has to wait for the result to be calculated in order to convert it to numpy array on CPU, hence taking a longer time. Other examples of 'blocking' operations include [asscalar](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asscalar) and [wait_to_read](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.wait_to_read).
 
-While it is possible to use [NDArray.waitall()](https://mxnet.apache.org/api/python/ndarray/ndarray.html?highlight=waitall#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](https://mxnet.apache.org/api/python/gluon/gluon.html?highlight=sequential#mxnet.gluon.nn.Sequential) or hybridized network.
+While it is possible to use [NDArray.waitall()](../../../api/ndarray/ndarray.rst#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or hybridized network.
 
 ## The correct way to profile
 
diff --git a/src/operator/svm_output.cc b/src/operator/svm_output.cc
index a52aa4779176..06e1ba2e8aa2 100644
--- a/src/operator/svm_output.cc
+++ b/src/operator/svm_output.cc
@@ -90,7 +90,7 @@ MXNET_REGISTER_OP_PROPERTY(SVMOutput, SVMOutputProp)
 .describe(R"code(Computes support vector machine based transformation of the input.
 
 This tutorial demonstrates using SVM as output layer for classification instead of softmax:
-https://github.com/dmlc/mxnet/tree/master/example/svm_mnist.
+https://github.com/dmlc/mxnet/tree/v1.x/example/svm_mnist.
 
 )code")
 .add_argument("data", "NDArray-or-Symbol", "Input data for SVM transformation.")
diff --git a/src/operator/tensor/matrix_op.cc b/src/operator/tensor/matrix_op.cc
index 10305a35fc28..04e6afa54ead 100644
--- a/src/operator/tensor/matrix_op.cc
+++ b/src/operator/tensor/matrix_op.cc
@@ -991,7 +991,7 @@ NNVM_REGISTER_OP(_backward_squeeze)
 NNVM_REGISTER_OP(depth_to_space)
 .describe(R"code(Rearranges(permutes) data from depth into blocks of spatial data.
 Similar to ONNX DepthToSpace operator:
-https://github.com/onnx/onnx/blob/master/docs/Operators.md#DepthToSpace.
+https://github.com/onnx/onnx/blob/master/docs/Operators.md#depthtospace.
 The output is a new tensor where the values from depth dimension are moved in spatial blocks
 to height and width dimension. The reverse of this operation is ``space_to_depth``.
 
@@ -1043,7 +1043,7 @@ Example::
 NNVM_REGISTER_OP(space_to_depth)
 .describe(R"code(Rearranges(permutes) blocks of spatial data into depth.
 Similar to ONNX SpaceToDepth operator:
-https://github.com/onnx/onnx/blob/master/docs/Operators.md#SpaceToDepth
+https://github.com/onnx/onnx/blob/master/docs/Operators.md#spacetodepth
 The output is a new tensor where the values from height and width dimension are
 moved to the depth dimension. The reverse of this operation is ``depth_to_space``.
 .. math::

From efc621d6c26b554924e748d0dd4d2b788e4bfe8d Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Tue, 3 Aug 2021 14:21:19 -0700
Subject: [PATCH 2/7] fix broken links

---
 docs/python_docs/python/tutorials/deploy/export/onnx.md | 2 +-
 python/mxnet/numpy/fallback.py                          | 6 ++++++
 src/io/iter_mnist.cc                                    | 4 ++--
 src/operator/tensor/matrix_op.cc                        | 4 ++--
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/docs/python_docs/python/tutorials/deploy/export/onnx.md b/docs/python_docs/python/tutorials/deploy/export/onnx.md
index 60f3b2e6b0e6..7961abad7e2a 100644
--- a/docs/python_docs/python/tutorials/deploy/export/onnx.md
+++ b/docs/python_docs/python/tutorials/deploy/export/onnx.md
@@ -29,7 +29,7 @@ In this tutorial, we will learn how to use MXNet to ONNX exporter on pre-trained
 
 To run the tutorial you will need to have installed the following python modules:
 - [MXNet >= 1.3.0](https://mxnet.apache.org/get_started)
-- [onnx]( https://github.com/onnx/onnx#installation) v1.2.1 (follow the install guide)
+- [onnx]( https://github.com/onnx/onnx#user-content-installation) v1.2.1 (follow the install guide)
 
 *Note:* MXNet-ONNX importer and exporter follows version 7 of ONNX operator set which comes with ONNX v1.2.1.
 
diff --git a/python/mxnet/numpy/fallback.py b/python/mxnet/numpy/fallback.py
index 76ebf61c57bb..83bf67372517 100644
--- a/python/mxnet/numpy/fallback.py
+++ b/python/mxnet/numpy/fallback.py
@@ -135,6 +135,12 @@ def wrapper(*args, **kwargs):
             # remove unused reference
             new_fn_doc = new_fn_doc.replace(
                 '.. [1] Wikipedia page: https://en.wikipedia.org/wiki/Trapezoidal_rule', '')
+        elif obj_name == "i0":
+            # replace broken link
+            new_fn_doc = new_fn_doc.replace(
+                '.. [3] http://kobesearch.cpan.org/htdocs/Math-Cephes/Math/Cephes.html',
+                '.. [3] https://metacpan.org/pod/distribution/Math-Cephes/lib/Math/Cephes.pod \
+                    #i0:-Modified-Bessel-function-of-order-zero')
         setattr(fallback_mod, obj_name, get_func(onp_obj, new_fn_doc))
     else:
         setattr(fallback_mod, obj_name, onp_obj)
diff --git a/src/io/iter_mnist.cc b/src/io/iter_mnist.cc
index 0d5f96c0e193..92dda471bc85 100644
--- a/src/io/iter_mnist.cc
+++ b/src/io/iter_mnist.cc
@@ -258,11 +258,11 @@ class MNISTIter: public IIterator<TBlobBatch> {
 DMLC_REGISTER_PARAMETER(MNISTParam);
 
 MXNET_REGISTER_IO_ITER(MNISTIter)
-.describe(R"code(Iterating on the MNIST dataset.
+.describe("Iterating on the MNIST dataset.
 
 One can download the dataset from http://yann.lecun.com/exdb/mnist/
 
-)code" ADD_FILELINE)
+" ADD_FILELINE)
 .add_arguments(MNISTParam::__FIELDS__())
 .add_arguments(PrefetcherParam::__FIELDS__())
 .set_body([]() {
diff --git a/src/operator/tensor/matrix_op.cc b/src/operator/tensor/matrix_op.cc
index 04e6afa54ead..ab05cd1bc79d 100644
--- a/src/operator/tensor/matrix_op.cc
+++ b/src/operator/tensor/matrix_op.cc
@@ -991,7 +991,7 @@ NNVM_REGISTER_OP(_backward_squeeze)
 NNVM_REGISTER_OP(depth_to_space)
 .describe(R"code(Rearranges(permutes) data from depth into blocks of spatial data.
 Similar to ONNX DepthToSpace operator:
-https://github.com/onnx/onnx/blob/master/docs/Operators.md#depthtospace.
+https://github.com/onnx/onnx/blob/master/docs/Operators.md#user-content-depthtospace.
 The output is a new tensor where the values from depth dimension are moved in spatial blocks
 to height and width dimension. The reverse of this operation is ``space_to_depth``.
 
@@ -1043,7 +1043,7 @@ Example::
 NNVM_REGISTER_OP(space_to_depth)
 .describe(R"code(Rearranges(permutes) blocks of spatial data into depth.
 Similar to ONNX SpaceToDepth operator:
-https://github.com/onnx/onnx/blob/master/docs/Operators.md#spacetodepth
+https://github.com/onnx/onnx/blob/master/docs/Operators.md#user-content-spacetodepth
 The output is a new tensor where the values from height and width dimension are
 moved to the depth dimension. The reverse of this operation is ``depth_to_space``.
 .. math::

From 4550af6dd8fa5da8bde43fd60979aa0c9a539b22 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Tue, 3 Aug 2021 21:23:39 -0700
Subject: [PATCH 3/7] fix link warning

---
 .../python/tutorials/performance/backend/profiler.md            | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/python_docs/python/tutorials/performance/backend/profiler.md b/docs/python_docs/python/tutorials/performance/backend/profiler.md
index b241feb40637..e7891c7677b9 100644
--- a/docs/python_docs/python/tutorials/performance/backend/profiler.md
+++ b/docs/python_docs/python/tutorials/performance/backend/profiler.md
@@ -46,7 +46,7 @@ From the timings above, it seems as if converting to numpy takes lot more time t
 
 This is because, in MXNet, all operations are executed asynchronously. So, when `nd.dot(x, x)` returns, the matrix multiplication is not complete, it has only been queued for execution. However, [asnumpy](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asnumpy) has to wait for the result to be calculated in order to convert it to numpy array on CPU, hence taking a longer time. Other examples of 'blocking' operations include [asscalar](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.asscalar) and [wait_to_read](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.NDArray.wait_to_read).
 
-While it is possible to use [NDArray.waitall()](../../../api/ndarray/ndarray.rst#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or hybridized network.
+While it is possible to use [NDArray.waitall()](../../../api/legacy/ndarray/ndarray.rst#mxnet.ndarray.waitall) before and after operations to get running time of operations, it is not a scalable method to measure running time of multiple sets of operations, especially in a [Sequential](../../../api/gluon/nn/index.rst#mxnet.gluon.nn.Sequential) or hybridized network.
 
 ## The correct way to profile
 

From 89c06ed4c08ad46b238f8242d6eb4a4ab82d9880 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Wed, 4 Aug 2021 12:12:18 -0700
Subject: [PATCH 4/7] [DOC] Part1: link check

---
 docs/python_docs/python/Makefile | 2 +-
 src/io/iter_mnist.cc             | 6 +-----
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/docs/python_docs/python/Makefile b/docs/python_docs/python/Makefile
index 154bcf275e89..e68891032b64 100644
--- a/docs/python_docs/python/Makefile
+++ b/docs/python_docs/python/Makefile
@@ -49,7 +49,7 @@ html: $(OBJ)
 	cp -n -r ../_static build/ || true;
 	sphinx-autogen build/api/*.rst build/api/**/*.rst   -t build/_templates/;
 	# make -C build linkcheck doctest html
-	make -C build html;
+	make -C build linkcheck html;
 	sed -i.bak 's/33\,150\,243/23\,141\,201/g'  build/_build/html/_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css;
 	make update_github_link
 
diff --git a/src/io/iter_mnist.cc b/src/io/iter_mnist.cc
index 92dda471bc85..bf590170ca9f 100644
--- a/src/io/iter_mnist.cc
+++ b/src/io/iter_mnist.cc
@@ -258,11 +258,7 @@ class MNISTIter: public IIterator<TBlobBatch> {
 DMLC_REGISTER_PARAMETER(MNISTParam);
 
 MXNET_REGISTER_IO_ITER(MNISTIter)
-.describe("Iterating on the MNIST dataset.
-
-One can download the dataset from http://yann.lecun.com/exdb/mnist/
-
-" ADD_FILELINE)
+.describe("Iterating on the MNIST dataset." ADD_FILELINE)
 .add_arguments(MNISTParam::__FIELDS__())
 .add_arguments(PrefetcherParam::__FIELDS__())
 .set_body([]() {

From 4e0f89b0ff78b4a82d9d122e619a9b3b33ae3435 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Wed, 4 Aug 2021 13:38:06 -0700
Subject: [PATCH 5/7] update link

---
 python/mxnet/contrib/onnx/mx2onnx/export_model.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/python/mxnet/contrib/onnx/mx2onnx/export_model.py b/python/mxnet/contrib/onnx/mx2onnx/export_model.py
index 2fc77604b9b6..9eb523cfb1d9 100644
--- a/python/mxnet/contrib/onnx/mx2onnx/export_model.py
+++ b/python/mxnet/contrib/onnx/mx2onnx/export_model.py
@@ -33,7 +33,8 @@ def export_model(sym, params, input_shape, input_type=np.float32,
     """Exports the MXNet model file, passed as a parameter, into ONNX model.
     Accepts both symbol,parameter objects as well as json and params filepaths as input.
     Operator support and coverage -
-    https://cwiki.apache.org/confluence/display/MXNET/ONNX+Operator+Coverage
+    https://github.com/apache/incubator-mxnet/tree/v1.x/\
+    python/mxnet/onnx#user-content-operator-support-matrix
 
     Parameters
     ----------

From eb245000352dd7e11a45efe0ed72779c68fb6d87 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Wed, 4 Aug 2021 15:33:55 -0700
Subject: [PATCH 6/7] update

---
 python/mxnet/onnx/mx2onnx/_export_model.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/mxnet/onnx/mx2onnx/_export_model.py b/python/mxnet/onnx/mx2onnx/_export_model.py
index fbfadde09508..7b9db5a11416 100644
--- a/python/mxnet/onnx/mx2onnx/_export_model.py
+++ b/python/mxnet/onnx/mx2onnx/_export_model.py
@@ -55,7 +55,7 @@ def export_model(sym, params, in_shapes=None, in_types=np.float32,
     """Exports the MXNet model file, passed as a parameter, into ONNX model.
     Accepts both symbol,parameter objects as well as json and params filepaths as input.
     Operator support and coverage -
-    https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#operator-support-matrix
+    https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#user-content-operator-support-matrix
 
     Parameters
     ----------

From 9c59603a3eee662a8a84def64c976950ee1dc2a1 Mon Sep 17 00:00:00 2001
From: barry-jin <barryjin1995@gmail.com>
Date: Wed, 4 Aug 2021 15:37:26 -0700
Subject: [PATCH 7/7] update

---
 .../contrib/onnx/mx2onnx/export_model.py      | 101 ------------------
 1 file changed, 101 deletions(-)
 delete mode 100644 python/mxnet/contrib/onnx/mx2onnx/export_model.py

diff --git a/python/mxnet/contrib/onnx/mx2onnx/export_model.py b/python/mxnet/contrib/onnx/mx2onnx/export_model.py
deleted file mode 100644
index c9293d917098..000000000000
--- a/python/mxnet/contrib/onnx/mx2onnx/export_model.py
+++ /dev/null
@@ -1,101 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-# coding: utf-8
-#pylint: disable-msg=too-many-arguments
-
-"""Exports an MXNet model to the ONNX model format"""
-import logging
-import numpy as np
-
-from ....base import string_types
-from .... import symbol
-from .export_onnx import MXNetGraph
-from ._export_helper import load_module
-
-
-def export_model(sym, params, input_shape, input_type=np.float32,
-                 onnx_file_path='model.onnx', verbose=False, opset_version=None):
-    """Exports the MXNet model file, passed as a parameter, into ONNX model.
-    Accepts both symbol,parameter objects as well as json and params filepaths as input.
-    Operator support and coverage -
-    https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#user-content-operator-support-matrix
-
-    Parameters
-    ----------
-    sym : str or symbol object
-        Path to the json file or Symbol object
-    params : str or symbol object
-        Path to the params file or params dictionary. (Including both arg_params and aux_params)
-    input_shape : List of tuple
-        Input shape of the model e.g [(1,3,224,224)]
-    input_type : data type
-        Input data type e.g. np.float32
-    onnx_file_path : str
-        Path where to save the generated onnx file
-    verbose : Boolean
-        If true will print logs of the model conversion
-
-    Returns
-    -------
-    onnx_file_path : str
-        Onnx file path
-
-    Notes
-    -----
-    This method is available when you ``import mxnet.contrib.onnx``
-
-    """
-
-    try:
-        from onnx import helper, mapping
-        from onnx.defs import onnx_opset_version
-    except ImportError:
-        raise ImportError("Onnx and protobuf need to be installed. "
-                          + "Instructions to install - https://github.com/onnx/onnx")
-
-    converter = MXNetGraph()
-    if opset_version is None:
-        # default is to use latest opset version the onnx package supports
-        opset_version = onnx_opset_version()
-
-    data_format = np.dtype(input_type)
-    # if input parameters are strings(file paths), load files and create symbol parameter objects
-    if isinstance(sym, string_types) and isinstance(params, string_types):
-        logging.info("Converting json and weight file to sym and params")
-        sym_obj, params_obj = load_module(sym, params)
-        onnx_graph = converter.create_onnx_graph_proto(sym_obj, params_obj, input_shape,
-                                                       mapping.NP_TYPE_TO_TENSOR_TYPE[data_format],
-                                                       verbose=verbose, opset_version=opset_version)
-    elif isinstance(sym, symbol.Symbol) and isinstance(params, dict):
-        onnx_graph = converter.create_onnx_graph_proto(sym, params, input_shape,
-                                                       mapping.NP_TYPE_TO_TENSOR_TYPE[data_format],
-                                                       verbose=verbose, opset_version=opset_version)
-    else:
-        raise ValueError("Input sym and params should either be files or objects")
-
-    # Create the model (ModelProto)
-    onnx_model = helper.make_model(onnx_graph)
-
-    # Save model on disk
-    with open(onnx_file_path, "wb") as file_handle:
-        serialized = onnx_model.SerializeToString()
-        file_handle.write(serialized)
-        logging.info("Input shape of the model %s ", input_shape)
-        logging.info("Exported ONNX file %s saved to disk", onnx_file_path)
-
-    return onnx_file_path