From b53c517d9cf39902028a4dcfc8afc39aa18b4e12 Mon Sep 17 00:00:00 2001 From: Ayush Joshi Date: Tue, 14 Nov 2023 12:15:43 +0530 Subject: [PATCH] Added best practices to train a neural network --- notebooks/ml/Machine_Learning.ipynb | 75 +++++++++++++++++++++++++++-- 1 file changed, 72 insertions(+), 3 deletions(-) diff --git a/notebooks/ml/Machine_Learning.ipynb b/notebooks/ml/Machine_Learning.ipynb index c7c5072..0432c5e 100644 --- a/notebooks/ml/Machine_Learning.ipynb +++ b/notebooks/ml/Machine_Learning.ipynb @@ -3,10 +3,11 @@ { "cell_type": "markdown", "metadata": { - "id": "view-in-github" + "id": "view-in-github", + "colab_type": "text" }, "source": [ - "\"Open" + "\"Open" ] }, { @@ -3775,12 +3776,80 @@ "* A set of biases, one for each node.\n", "* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions." ] + }, + { + "cell_type": "markdown", + "source": [ + "# Training Neural Networks\n", + "\n", + "**Backpropagation** is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks." + ], + "metadata": { + "id": "pz_VyGGXipWg" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Best Practices\n", + "\n", + "This section explains backpropagation's failure cases and the most common way to regularize a neural network." + ], + "metadata": { + "id": "FliCK2_Njxd2" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Failure Cases\n", + "\n", + "There are a number of common ways for backpropagation to go wrong.\n", + "\n", + "#### Vanishing Gradients\n", + "\n", + "The gradients for the lower layers (closer to the input) can become very small. In deep networks, computing these gradients can involve taking the product of many small terms.\n", + "\n", + "When the gradients vanish toward 0 for the lower layers, these layers train very slowly, or not at all.\n", + "\n", + "The ReLU activation function can help prevent vanishing gradients.\n", + "\n", + "#### Exploding Gradients\n", + "\n", + "If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms. In this case you can have exploding gradients: gradients that get too large to converge.\n", + "\n", + "Batch normalization can help prevent exploding gradients, as can lowering the learning rate.\n", + "\n", + "#### Dead ReLU Units\n", + "\n", + "Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0 activation, contributing nothing to the network's output, and gradients can no longer flow through it during backpropagation. With a source of gradients cut off, the input to the ReLU may not ever change enough to bring the weighted sum back above 0.\n", + "\n", + "Lowering the learning rate can help keep ReLU units from dying." + ], + "metadata": { + "id": "ryqeNBaDj34M" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Dropout Regularization\n", + "\n", + "Yet another form of regularization, called **Dropout**, is useful for neural networks. It works by randomly \"dropping out\" unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization:\n", + "\n", + "* 0.0 = No dropout regularization.\n", + "* 1.0 = Drop out everything. The model learns nothing.\n", + "* Values between 0.0 and 1.0 = More useful." + ], + "metadata": { + "id": "ZNVls-ewkXoE" + } } ], "metadata": { "colab": { "provenance": [], - "toc_visible": true + "include_colab_link": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)",