From b53c517d9cf39902028a4dcfc8afc39aa18b4e12 Mon Sep 17 00:00:00 2001
From: Ayush Joshi <ayush854032@gmail.com>
Date: Tue, 14 Nov 2023 12:15:43 +0530
Subject: [PATCH] Added best practices to train a neural network

---
 notebooks/ml/Machine_Learning.ipynb | 75 +++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)
diff --git a/notebooks/ml/Machine_Learning.ipynb b/notebooks/ml/Machine_Learning.ipynb
index c7c5072..0432c5e 100644
--- a/notebooks/ml/Machine_Learning.ipynb
+++ b/notebooks/ml/Machine_Learning.ipynb
@@ -3,10 +3,11 @@
     {
       "cell_type": "markdown",
       "metadata": {
-        "id": "view-in-github"
+        "id": "view-in-github",
+        "colab_type": "text"
       },
       "source": [
-        "<a href=\"https://colab.research.google.com/github/Rishabh-Dhami/ai/blob/formatting-fixes/notebooks/ml/Machine_Learning.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+        "<a href=\"https://colab.research.google.com/github/joshiayush/ai/blob/master/notebooks/ml/Machine_Learning.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
       ]
     },
     {
@@ -3775,12 +3776,80 @@
         "* A set of biases, one for each node.\n",
         "* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions."
       ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Training Neural Networks\n",
+        "\n",
+        "**Backpropagation** is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks."
+      ],
+      "metadata": {
+        "id": "pz_VyGGXipWg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Best Practices\n",
+        "\n",
+        "This section explains backpropagation's failure cases and the most common way to regularize a neural network."
+      ],
+      "metadata": {
+        "id": "FliCK2_Njxd2"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Failure Cases\n",
+        "\n",
+        "There are a number of common ways for backpropagation to go wrong.\n",
+        "\n",
+        "#### Vanishing Gradients\n",
+        "\n",
+        "The gradients for the lower layers (closer to the input) can become very small. In deep networks, computing these gradients can involve taking the product of many small terms.\n",
+        "\n",
+        "When the gradients vanish toward 0 for the lower layers, these layers train very slowly, or not at all.\n",
+        "\n",
+        "The ReLU activation function can help prevent vanishing gradients.\n",
+        "\n",
+        "#### Exploding Gradients\n",
+        "\n",
+        "If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms. In this case you can have exploding gradients: gradients that get too large to converge.\n",
+        "\n",
+        "Batch normalization can help prevent exploding gradients, as can lowering the learning rate.\n",
+        "\n",
+        "#### Dead ReLU Units\n",
+        "\n",
+        "Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0 activation, contributing nothing to the network's output, and gradients can no longer flow through it during backpropagation. With a source of gradients cut off, the input to the ReLU may not ever change enough to bring the weighted sum back above 0.\n",
+        "\n",
+        "Lowering the learning rate can help keep ReLU units from dying."
+      ],
+      "metadata": {
+        "id": "ryqeNBaDj34M"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Dropout Regularization\n",
+        "\n",
+        "Yet another form of regularization, called **Dropout**, is useful for neural networks. It works by randomly \"dropping out\" unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization:\n",
+        "\n",
+        "* 0.0 = No dropout regularization.\n",
+        "* 1.0 = Drop out everything. The model learns nothing.\n",
+        "* Values between 0.0 and 1.0 = More useful."
+      ],
+      "metadata": {
+        "id": "ZNVls-ewkXoE"
+      }
     }
   ],
   "metadata": {
     "colab": {
       "provenance": [],
-      "toc_visible": true
+      "include_colab_link": true
     },
     "kernelspec": {
       "display_name": "Python 3 (ipykernel)",