diff --git a/doc/pub/catania/ipynb/catania.ipynb b/doc/pub/catania/ipynb/catania.ipynb
index a32f77d7..11ee8569 100644
--- a/doc/pub/catania/ipynb/catania.ipynb
+++ b/doc/pub/catania/ipynb/catania.ipynb
@@ -3,9 +3,7 @@
{
"cell_type": "markdown",
"id": "0cd93842",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"\n",
@@ -15,9 +13,7 @@
{
"cell_type": "markdown",
"id": "653c9ba3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"# Mathematics of discriminative and generative deep learning, from deep neural networks to diffusion models\n",
"**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University, East Lansing, Michigan, USA\n",
@@ -28,9 +24,7 @@
{
"cell_type": "markdown",
"id": "edbeb108",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Types of machine learning\n",
"\n",
@@ -47,9 +41,7 @@
{
"cell_type": "markdown",
"id": "08b3ff3a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Main categories\n",
"Another way to categorize machine learning tasks is to consider the desired output of a system.\n",
@@ -65,9 +57,7 @@
{
"cell_type": "markdown",
"id": "acb8af9d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Machine learning. A simple perspective on the interface between ML and Physics\n",
"\n",
@@ -81,9 +71,7 @@
{
"cell_type": "markdown",
"id": "df158234",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## ML in Nuclear Physics (or any field in physics)\n",
"\n",
@@ -97,9 +85,7 @@
{
"cell_type": "markdown",
"id": "2c394af4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The plethora of machine learning algorithms/methods\n",
"\n",
@@ -119,9 +105,7 @@
{
"cell_type": "markdown",
"id": "17f9e522",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Extrapolations and model interpretability\n",
"\n",
@@ -141,9 +125,7 @@
{
"cell_type": "markdown",
"id": "5d68e865",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Generative and discriminative models\n",
"\n",
@@ -159,9 +141,7 @@
{
"cell_type": "markdown",
"id": "d3540c12",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## [Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023)](https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062) at density $\\rho=0.04$ fm$^{-3}$\n",
"\n",
@@ -175,9 +155,7 @@
{
"cell_type": "markdown",
"id": "08fa3db3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The electron gas in three dimensions with $N=14$ electrons (Wigner-Seitz radius $r_s=2$ a.u.), [Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,](https://doi.org/10.48550/arXiv.2305.07240)\n",
"\n",
@@ -191,9 +169,7 @@
{
"cell_type": "markdown",
"id": "cd28f343",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## What Is Generative Modeling?\n",
"\n",
@@ -214,9 +190,7 @@
{
"cell_type": "markdown",
"id": "58b00c74",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Example of generative modeling, [taken from Generative Deep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n",
"\n",
@@ -230,9 +204,7 @@
{
"cell_type": "markdown",
"id": "cd7ea407",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Generative Modeling\n",
"\n",
@@ -256,9 +228,7 @@
{
"cell_type": "markdown",
"id": "165edfcd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Generative Versus Discriminative Modeling\n",
"\n",
@@ -272,9 +242,7 @@
{
"cell_type": "markdown",
"id": "7b5105e2",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Example of discriminative modeling, [taken from Generative Deeep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n",
"\n",
@@ -288,9 +256,7 @@
{
"cell_type": "markdown",
"id": "d4b748ef",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Discriminative Modeling\n",
"\n",
@@ -308,9 +274,7 @@
{
"cell_type": "markdown",
"id": "6fe09ccb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Taxonomy of generative deep learning, [taken from Generative Deep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n",
"\n",
@@ -324,9 +288,7 @@
{
"cell_type": "markdown",
"id": "81270aed",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Good books with hands-on material and codes\n",
"* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)\n",
@@ -342,9 +304,7 @@
{
"cell_type": "markdown",
"id": "bfe945b6",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## More references\n",
"\n",
@@ -366,9 +326,7 @@
{
"cell_type": "markdown",
"id": "ba7ef646",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## What are the basic Machine Learning ingredients?\n",
"Almost every problem in ML and data science starts with the same ingredients:\n",
@@ -384,9 +342,7 @@
{
"cell_type": "markdown",
"id": "b834cc04",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Low-level machine learning, the family of ordinary least squares methods\n",
"\n",
@@ -400,9 +356,7 @@
{
"cell_type": "markdown",
"id": "da8627ae",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{y}=f(\\boldsymbol{x})+\\boldsymbol{\\epsilon}.\n",
@@ -412,9 +366,7 @@
{
"cell_type": "markdown",
"id": "e849da7b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Setting up the equations\n",
"\n",
@@ -430,9 +382,7 @@
{
"cell_type": "markdown",
"id": "65a452bd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{\\tilde{y}}= \\boldsymbol{X}\\boldsymbol{\\theta},\n",
@@ -442,9 +392,7 @@
{
"cell_type": "markdown",
"id": "9deb7a37",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The objective/cost/loss function\n",
"\n",
@@ -454,9 +402,7 @@
{
"cell_type": "markdown",
"id": "38f213f1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\sum_{i=0}^{n-1}\\left(y_i-\\tilde{y}_i\\right)^2=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}}\\right)\\right\\},\n",
@@ -466,9 +412,7 @@
{
"cell_type": "markdown",
"id": "45ae1136",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"or using the matrix $\\boldsymbol{X}$ and in a more compact matrix-vector notation as"
]
@@ -476,9 +420,7 @@
{
"cell_type": "markdown",
"id": "f66c26e8",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
@@ -488,9 +430,7 @@
{
"cell_type": "markdown",
"id": "dd3a5090",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"This function represents one of many possible ways to define the so-called cost function."
]
@@ -498,9 +438,7 @@
{
"cell_type": "markdown",
"id": "7872d11e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Training solution\n",
"\n",
@@ -510,9 +448,7 @@
{
"cell_type": "markdown",
"id": "ad0213af",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{X}^T\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{X}\\boldsymbol{\\theta},\n",
@@ -522,9 +458,7 @@
{
"cell_type": "markdown",
"id": "d78419a1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and if the matrix $\\boldsymbol{X}^T\\boldsymbol{X}$ is invertible we have the optimal values"
]
@@ -532,9 +466,7 @@
{
"cell_type": "markdown",
"id": "fb800d02",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\hat{\\boldsymbol{\\theta}} =\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}.\n",
@@ -544,9 +476,7 @@
{
"cell_type": "markdown",
"id": "7f487392",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"We say we 'learn' the unknown parameters $\\boldsymbol{\\theta}$ from the last equation."
]
@@ -554,9 +484,7 @@
{
"cell_type": "markdown",
"id": "e39f7c17",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Ridge and LASSO Regression\n",
"\n",
@@ -566,9 +494,7 @@
{
"cell_type": "markdown",
"id": "aca472b7",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in {\\mathbb{R}}^{p}}}\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
@@ -578,9 +504,7 @@
{
"cell_type": "markdown",
"id": "b7987ee7",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"or we can state it as"
]
@@ -588,9 +512,7 @@
{
"cell_type": "markdown",
"id": "8ed718ee",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n",
@@ -601,9 +523,7 @@
{
"cell_type": "markdown",
"id": "15bbe9c3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where we have used the definition of a norm-2 vector, that is"
]
@@ -611,9 +531,7 @@
{
"cell_type": "markdown",
"id": "7827172f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\vert\\vert \\boldsymbol{x}\\vert\\vert_2 = \\sqrt{\\sum_i x_i^2}.\n",
@@ -623,9 +541,7 @@
{
"cell_type": "markdown",
"id": "52a8084f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## From OLS to Ridge and Lasso\n",
"\n",
@@ -638,9 +554,7 @@
{
"cell_type": "markdown",
"id": "67569afe",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n",
@@ -651,9 +565,7 @@
{
"cell_type": "markdown",
"id": "be6c3913",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which leads to the Ridge regression minimization problem where we\n",
"require that $\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_2^2\\le t$, where $t$ is\n",
@@ -663,9 +575,7 @@
{
"cell_type": "markdown",
"id": "8b9c9420",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Lasso regression\n",
"\n",
@@ -675,9 +585,7 @@
{
"cell_type": "markdown",
"id": "884822e5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(\\boldsymbol{X},\\boldsymbol{\\theta})=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_1,\n",
@@ -687,9 +595,7 @@
{
"cell_type": "markdown",
"id": "20fb19fc",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"we have a new optimization equation"
]
@@ -697,9 +603,7 @@
{
"cell_type": "markdown",
"id": "ba368a55",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n",
@@ -710,9 +614,7 @@
{
"cell_type": "markdown",
"id": "6de29ccc",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which leads to Lasso regression. Lasso stands for least absolute shrinkage and selection operator. \n",
"Here we have defined the norm-1 as"
@@ -721,9 +623,7 @@
{
"cell_type": "markdown",
"id": "e083172b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\vert\\vert \\boldsymbol{x}\\vert\\vert_1 = \\sum_i \\vert x_i\\vert.\n",
@@ -733,9 +633,7 @@
{
"cell_type": "markdown",
"id": "4dff529c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Selected references\n",
"* [Mehta et al.](https://arxiv.org/abs/1803.08823) and [Physics Reports (2019)](https://www.sciencedirect.com/science/article/pii/S0370157319300766?via%3Dihub).\n",
@@ -756,9 +654,7 @@
{
"cell_type": "markdown",
"id": "7facb953",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Setting up the basic equations for neural networks\n",
"\n",
@@ -778,9 +674,7 @@
{
"cell_type": "markdown",
"id": "803a20ac",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Overarching view of a neural network\n",
"\n",
@@ -805,9 +699,7 @@
{
"cell_type": "markdown",
"id": "63aa6478",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Illustration of a single perceptron model and a multilayer FFNN\n",
"\n",
@@ -821,9 +713,7 @@
{
"cell_type": "markdown",
"id": "93b10615",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The optimization problem\n",
"\n",
@@ -838,9 +728,7 @@
{
"cell_type": "markdown",
"id": "bcdc8f2e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
@@ -850,9 +738,7 @@
{
"cell_type": "markdown",
"id": "63764264",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"This function represents one of many possible ways to define\n",
"the so-called cost function."
@@ -861,9 +747,7 @@
{
"cell_type": "markdown",
"id": "6ea4a585",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Weights and biases\n",
"\n",
@@ -880,9 +764,7 @@
{
"cell_type": "markdown",
"id": "f3d0b2f4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Other ingredients of a neural network\n",
"\n",
@@ -909,9 +791,7 @@
{
"cell_type": "markdown",
"id": "72e9a70e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Other parameters\n",
"\n",
@@ -923,9 +803,7 @@
{
"cell_type": "markdown",
"id": "24166378",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Why Feed Forward Neural Networks (FFNN)?\n",
"\n",
@@ -940,9 +818,7 @@
{
"cell_type": "markdown",
"id": "ad0e414e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Universal approximation theorem\n",
"\n",
@@ -956,9 +832,7 @@
{
"cell_type": "markdown",
"id": "4a7b41e1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
@@ -968,9 +842,7 @@
{
"cell_type": "markdown",
"id": "c4acf546",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
"cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
@@ -982,9 +854,7 @@
{
"cell_type": "markdown",
"id": "e353dc32",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
@@ -994,9 +864,7 @@
{
"cell_type": "markdown",
"id": "5a87dccb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The approximation theorem in words\n",
"\n",
@@ -1010,9 +878,7 @@
{
"cell_type": "markdown",
"id": "35491e6c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
@@ -1022,9 +888,7 @@
{
"cell_type": "markdown",
"id": "b809821f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Then we have"
]
@@ -1032,9 +896,7 @@
{
"cell_type": "markdown",
"id": "774f0764",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
@@ -1044,9 +906,7 @@
{
"cell_type": "markdown",
"id": "3d7088d0",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## More on the general approximation theorem\n",
"\n",
@@ -1062,9 +922,7 @@
{
"cell_type": "markdown",
"id": "ec2718cd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Class of functions we can approximate\n",
"\n",
@@ -1075,9 +933,7 @@
{
"cell_type": "markdown",
"id": "8ef486d5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Simple example, fitting nuclear masses\n",
"\n",
@@ -1089,9 +945,7 @@
{
"cell_type": "markdown",
"id": "855549f0",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## First network example, simple percepetron with one input\n",
"\n",
@@ -1104,9 +958,7 @@
{
"cell_type": "markdown",
"id": "6467d797",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z_1 = w_1x+b_1,\n",
@@ -1116,9 +968,7 @@
{
"cell_type": "markdown",
"id": "c2b8c002",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
"parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n",
@@ -1130,9 +980,7 @@
{
"cell_type": "markdown",
"id": "45b6c18c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
@@ -1142,9 +990,7 @@
{
"cell_type": "markdown",
"id": "e6020883",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Optimizing the parameters\n",
"\n",
@@ -1158,9 +1004,7 @@
{
"cell_type": "markdown",
"id": "87360dbd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
@@ -1170,9 +1014,7 @@
{
"cell_type": "markdown",
"id": "392dda6a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Using the chain rule we find"
]
@@ -1180,9 +1022,7 @@
{
"cell_type": "markdown",
"id": "e7ab5332",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
@@ -1192,9 +1032,7 @@
{
"cell_type": "markdown",
"id": "f1f1bcc8",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -1202,9 +1040,7 @@
{
"cell_type": "markdown",
"id": "31c9897a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
@@ -1214,9 +1050,7 @@
{
"cell_type": "markdown",
"id": "cacef68b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which we later will just define as"
]
@@ -1224,9 +1058,7 @@
{
"cell_type": "markdown",
"id": "a310325f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
@@ -1236,9 +1068,7 @@
{
"cell_type": "markdown",
"id": "e440e4eb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Implementing the simple perceptron model\n",
"\n",
@@ -1251,9 +1081,7 @@
{
"cell_type": "markdown",
"id": "c6e7a1c8",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(y,w_1,b_1)=\\frac{1}{2}(a_1-y)^2,\n",
@@ -1263,9 +1091,7 @@
{
"cell_type": "markdown",
"id": "445560fe",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with $a_1$ the output from the network."
]
@@ -1274,11 +1100,16 @@
"cell_type": "code",
"execution_count": 1,
"id": "79d0d646",
- "metadata": {
- "collapsed": false,
- "editable": true
- },
- "outputs": [],
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "4.0022640019432767e-07\n"
+ ]
+ }
+ ],
"source": [
"%matplotlib inline\n",
"\n",
@@ -1337,9 +1168,7 @@
{
"cell_type": "markdown",
"id": "42993345",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Running this code gives us an acceptable results after some 40-50 iterations. Note that the results depend on the value of the learning rate."
]
@@ -1347,9 +1176,7 @@
{
"cell_type": "markdown",
"id": "a9d5deec",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Exercise 1: Extensions to the above code\n",
"\n",
@@ -1368,9 +1195,7 @@
{
"cell_type": "markdown",
"id": "4d4929b1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Adding a hidden layer\n",
"\n",
@@ -1384,9 +1209,7 @@
{
"cell_type": "markdown",
"id": "16680e62",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
@@ -1396,9 +1219,7 @@
{
"cell_type": "markdown",
"id": "4e6d767a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
@@ -1408,9 +1229,7 @@
{
"cell_type": "markdown",
"id": "3263f740",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and the cost function"
]
@@ -1418,9 +1237,7 @@
{
"cell_type": "markdown",
"id": "9c2a591a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
@@ -1430,9 +1247,7 @@
{
"cell_type": "markdown",
"id": "f406da56",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
]
@@ -1440,9 +1255,7 @@
{
"cell_type": "markdown",
"id": "1793a059",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The derivatives\n",
"\n",
@@ -1452,9 +1265,7 @@
{
"cell_type": "markdown",
"id": "f9194184",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
@@ -1464,9 +1275,7 @@
{
"cell_type": "markdown",
"id": "1e573596",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
@@ -1476,9 +1285,7 @@
{
"cell_type": "markdown",
"id": "44cd323c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
@@ -1488,9 +1295,7 @@
{
"cell_type": "markdown",
"id": "98b20680",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
@@ -1500,9 +1305,7 @@
{
"cell_type": "markdown",
"id": "e0f06329",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Can you generalize this to more than one hidden layer?"
]
@@ -1510,9 +1313,7 @@
{
"cell_type": "markdown",
"id": "43161030",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Important observations\n",
"\n",
@@ -1526,9 +1327,7 @@
{
"cell_type": "markdown",
"id": "2ef1c290",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The training\n",
"\n",
@@ -1538,9 +1337,7 @@
{
"cell_type": "markdown",
"id": "bd16d507",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
@@ -1550,9 +1347,7 @@
{
"cell_type": "markdown",
"id": "3e536774",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -1560,9 +1355,7 @@
{
"cell_type": "markdown",
"id": "379286f4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"b_i \\leftarrow b_i-\\eta \\delta_i,\n",
@@ -1572,9 +1365,7 @@
{
"cell_type": "markdown",
"id": "de4009b3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with $\\eta$ is the learning rate.\n",
"\n",
@@ -1586,9 +1377,7 @@
{
"cell_type": "markdown",
"id": "7692a4d5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Code example\n",
"\n",
@@ -1604,11 +1393,65 @@
"cell_type": "code",
"execution_count": 2,
"id": "d4814e9c",
- "metadata": {
- "collapsed": false,
- "editable": true
- },
- "outputs": [],
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[36.89563074]\n",
+ "[23.62323175]\n",
+ "[15.1251681]\n",
+ "[9.68402334]\n",
+ "[6.20020163]\n",
+ "[3.96963458]\n",
+ "[2.54150298]\n",
+ "[1.62714703]\n",
+ "[1.0417409]\n",
+ "[0.66694492]\n",
+ "[0.42699034]\n",
+ "[0.27336592]\n",
+ "[0.17501258]\n",
+ "[0.11204514]\n",
+ "[0.07173249]\n",
+ "[0.04592382]\n",
+ "[0.02940083]\n",
+ "[0.01882264]\n",
+ "[0.01205039]\n",
+ "[0.00771475]\n",
+ "[0.00493903]\n",
+ "[0.003162]\n",
+ "[0.00202433]\n",
+ "[0.00129599]\n",
+ "[0.0008297]\n",
+ "[0.00053118]\n",
+ "[0.00034006]\n",
+ "[0.00021771]\n",
+ "[0.00013938]\n",
+ "[8.92313548e-05]\n",
+ "[5.71263851e-05]\n",
+ "[3.6572612e-05]\n",
+ "[2.34139775e-05]\n",
+ "[1.49897504e-05]\n",
+ "[9.5965161e-06]\n",
+ "[6.14373934e-06]\n",
+ "[3.93325371e-06]\n",
+ "[2.51808934e-06]\n",
+ "[1.61209378e-06]\n",
+ "[1.03207075e-06]\n",
+ "[6.60737006e-07]\n",
+ "[4.23007231e-07]\n",
+ "[2.70811405e-07]\n",
+ "[1.73374853e-07]\n",
+ "[1.10995472e-07]\n",
+ "[7.10598715e-08]\n",
+ "[4.54928947e-08]\n",
+ "[2.91247848e-08]\n",
+ "[1.86458368e-08]\n",
+ "[1.19371605e-08]\n"
+ ]
+ }
+ ],
"source": [
"import numpy as np\n",
"# We use the Sigmoid function as activation function\n",
@@ -1677,9 +1520,7 @@
{
"cell_type": "markdown",
"id": "cd7ae3ed",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
]
@@ -1687,9 +1528,7 @@
{
"cell_type": "markdown",
"id": "6bb9a0e1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Exercise 2: Including more data\n",
"\n",
@@ -1705,9 +1544,7 @@
{
"cell_type": "markdown",
"id": "06563003",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Simple neural network and the back propagation equations\n",
"\n",
@@ -1722,9 +1559,7 @@
{
"cell_type": "markdown",
"id": "5a3fe09a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
@@ -1734,9 +1569,7 @@
{
"cell_type": "markdown",
"id": "321b4a6d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"The hidden layer (layer $(1)$) has nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
]
@@ -1744,9 +1577,7 @@
{
"cell_type": "markdown",
"id": "3a8f136c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
@@ -1756,9 +1587,7 @@
{
"cell_type": "markdown",
"id": "bb33455e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The ouput layer\n",
"\n",
@@ -1768,9 +1597,7 @@
{
"cell_type": "markdown",
"id": "4b97a8d6",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
@@ -1780,9 +1607,7 @@
{
"cell_type": "markdown",
"id": "cfbbb52d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
"The parameters we need to optimize are given by"
@@ -1791,9 +1616,7 @@
{
"cell_type": "markdown",
"id": "a3f8f887",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
@@ -1803,9 +1626,7 @@
{
"cell_type": "markdown",
"id": "a097c6e9",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Compact expressions\n",
"\n",
@@ -1816,9 +1637,7 @@
{
"cell_type": "markdown",
"id": "9ec6339d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
@@ -1828,9 +1647,7 @@
{
"cell_type": "markdown",
"id": "b340d6f5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with outputs"
]
@@ -1838,9 +1655,7 @@
{
"cell_type": "markdown",
"id": "3bc67ced",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
@@ -1850,9 +1665,7 @@
{
"cell_type": "markdown",
"id": "fa8e855e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Output layer\n",
"\n",
@@ -1862,9 +1675,7 @@
{
"cell_type": "markdown",
"id": "fec79799",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
@@ -1874,9 +1685,7 @@
{
"cell_type": "markdown",
"id": "e92e9b1f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"resulting in the output"
]
@@ -1884,9 +1693,7 @@
{
"cell_type": "markdown",
"id": "4b4e69bf",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
@@ -1896,9 +1703,7 @@
{
"cell_type": "markdown",
"id": "a660f211",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Explicit derivatives\n",
"\n",
@@ -1912,9 +1717,7 @@
{
"cell_type": "markdown",
"id": "1989b4fc",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
@@ -1924,9 +1727,7 @@
{
"cell_type": "markdown",
"id": "4a034c14",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with"
]
@@ -1934,9 +1735,7 @@
{
"cell_type": "markdown",
"id": "274e1c4d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -1946,9 +1745,7 @@
{
"cell_type": "markdown",
"id": "533ae9e7",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and finally"
]
@@ -1956,9 +1753,7 @@
{
"cell_type": "markdown",
"id": "bea1c60a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
@@ -1968,9 +1763,7 @@
{
"cell_type": "markdown",
"id": "5ce5b908",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Derivatives of the hidden layer\n",
"\n",
@@ -1980,9 +1773,7 @@
{
"cell_type": "markdown",
"id": "324b5c28",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -1993,9 +1784,7 @@
{
"cell_type": "markdown",
"id": "adff05cd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which, noting that"
]
@@ -2003,9 +1792,7 @@
{
"cell_type": "markdown",
"id": "5270d5fa",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
@@ -2015,9 +1802,7 @@
{
"cell_type": "markdown",
"id": "c83a9ef4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"allows us to rewrite"
]
@@ -2025,9 +1810,7 @@
{
"cell_type": "markdown",
"id": "747da5ad",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
@@ -2037,9 +1820,7 @@
{
"cell_type": "markdown",
"id": "db9aa150",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Final expression\n",
"Defining"
@@ -2048,9 +1829,7 @@
{
"cell_type": "markdown",
"id": "fe3ce041",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
@@ -2060,9 +1839,7 @@
{
"cell_type": "markdown",
"id": "1d7cbe9d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"we have"
]
@@ -2070,9 +1847,7 @@
{
"cell_type": "markdown",
"id": "9e16a856",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
@@ -2082,9 +1857,7 @@
{
"cell_type": "markdown",
"id": "095624ba",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Similarly, we obtain"
]
@@ -2092,9 +1865,7 @@
{
"cell_type": "markdown",
"id": "bd65ea09",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
@@ -2104,9 +1875,7 @@
{
"cell_type": "markdown",
"id": "008e4371",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Completing the list\n",
"\n",
@@ -2116,9 +1885,7 @@
{
"cell_type": "markdown",
"id": "f55b8eab",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
@@ -2128,9 +1895,7 @@
{
"cell_type": "markdown",
"id": "5126ab8e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2138,9 +1903,7 @@
{
"cell_type": "markdown",
"id": "0f506e21",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
@@ -2150,9 +1913,7 @@
{
"cell_type": "markdown",
"id": "f09773bb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where we have defined"
]
@@ -2160,9 +1921,7 @@
{
"cell_type": "markdown",
"id": "13faa8b3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
@@ -2172,9 +1931,7 @@
{
"cell_type": "markdown",
"id": "a1b55a23",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Final expressions for the biases of the hidden layer\n",
"\n",
@@ -2184,9 +1941,7 @@
{
"cell_type": "markdown",
"id": "f544649c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
@@ -2196,9 +1951,7 @@
{
"cell_type": "markdown",
"id": "9406250d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2206,9 +1959,7 @@
{
"cell_type": "markdown",
"id": "65f3005e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
@@ -2218,9 +1969,7 @@
{
"cell_type": "markdown",
"id": "df7ed34c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"As we will see below, these expressions can be generalized in a more compact form."
]
@@ -2228,9 +1977,7 @@
{
"cell_type": "markdown",
"id": "d4c8367a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Gradient expressions\n",
"\n",
@@ -2241,9 +1988,7 @@
{
"cell_type": "markdown",
"id": "8153eaa3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
@@ -2253,9 +1998,7 @@
{
"cell_type": "markdown",
"id": "a6e2ae3c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2263,9 +2006,7 @@
{
"cell_type": "markdown",
"id": "1d1a819f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
@@ -2275,9 +2016,7 @@
{
"cell_type": "markdown",
"id": "ae342d50",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2285,9 +2024,7 @@
{
"cell_type": "markdown",
"id": "f10677c1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
@@ -2297,9 +2034,7 @@
{
"cell_type": "markdown",
"id": "dea5c92f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2307,9 +2042,7 @@
{
"cell_type": "markdown",
"id": "208ad938",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
@@ -2319,9 +2052,7 @@
{
"cell_type": "markdown",
"id": "48f04f55",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where $\\eta$ is the learning rate."
]
@@ -2329,9 +2060,7 @@
{
"cell_type": "markdown",
"id": "fc04866a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Exercise 3: Extended program\n",
"\n",
@@ -2341,9 +2070,7 @@
{
"cell_type": "markdown",
"id": "d8ba650b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
@@ -2353,9 +2080,7 @@
{
"cell_type": "markdown",
"id": "9fc12f59",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
]
@@ -2363,9 +2088,7 @@
{
"cell_type": "markdown",
"id": "53bffa05",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
@@ -2375,9 +2098,7 @@
{
"cell_type": "markdown",
"id": "6203af4d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
"You can extend your code to include automatic differentiation.\n",
@@ -2388,9 +2109,7 @@
{
"cell_type": "markdown",
"id": "16eb0b7d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Getting serious, the back propagation equations for a neural network\n",
"\n",
@@ -2402,9 +2121,7 @@
{
"cell_type": "markdown",
"id": "f10f405a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
@@ -2414,9 +2131,7 @@
{
"cell_type": "markdown",
"id": "36bce326",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Defining"
]
@@ -2424,9 +2139,7 @@
{
"cell_type": "markdown",
"id": "ca2b8428",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
@@ -2436,9 +2149,7 @@
{
"cell_type": "markdown",
"id": "ba48e670",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and using the Hadamard product of two vectors we can write this as"
]
@@ -2446,9 +2157,7 @@
{
"cell_type": "markdown",
"id": "6f4abc32",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
@@ -2458,9 +2167,7 @@
{
"cell_type": "markdown",
"id": "3e0c9116",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Analyzing the last results\n",
"\n",
@@ -2476,9 +2183,7 @@
{
"cell_type": "markdown",
"id": "a8270c3a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## More considerations\n",
"\n",
@@ -2494,9 +2199,7 @@
{
"cell_type": "markdown",
"id": "36355d0d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
@@ -2506,9 +2209,7 @@
{
"cell_type": "markdown",
"id": "40976099",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
]
@@ -2516,9 +2217,7 @@
{
"cell_type": "markdown",
"id": "17a9cb41",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1}.\n",
@@ -2528,9 +2227,7 @@
{
"cell_type": "markdown",
"id": "5857bfb0",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Derivatives in terms of $z_j^L$\n",
"\n",
@@ -2540,9 +2237,7 @@
{
"cell_type": "markdown",
"id": "e351d319",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
@@ -2552,9 +2247,7 @@
{
"cell_type": "markdown",
"id": "1d820b48",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
]
@@ -2562,9 +2255,7 @@
{
"cell_type": "markdown",
"id": "4fb8bc9c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
@@ -2574,9 +2265,7 @@
{
"cell_type": "markdown",
"id": "5dca3cda",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
]
@@ -2584,9 +2273,7 @@
{
"cell_type": "markdown",
"id": "5871156c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Bringing it together\n",
"\n",
@@ -2596,9 +2283,7 @@
{
"cell_type": "markdown",
"id": "a7a60892",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"\n",
"
\n",
@@ -2614,9 +2299,7 @@
{
"cell_type": "markdown",
"id": "9b0bdf83",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2624,9 +2307,7 @@
{
"cell_type": "markdown",
"id": "a7e03a9c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"\n",
"\n",
@@ -2642,9 +2323,7 @@
{
"cell_type": "markdown",
"id": "5624e6de",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -2652,9 +2331,7 @@
{
"cell_type": "markdown",
"id": "a6a73788",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"\n",
"\n",
@@ -2670,9 +2347,7 @@
{
"cell_type": "markdown",
"id": "fa0c863b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Final back propagating equation\n",
"\n",
@@ -2682,9 +2357,7 @@
{
"cell_type": "markdown",
"id": "7b9fd625",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
@@ -2694,9 +2367,7 @@
{
"cell_type": "markdown",
"id": "2d6110ec",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"We want to express this in terms of the equations for layer $l+1$."
]
@@ -2704,9 +2375,7 @@
{
"cell_type": "markdown",
"id": "98aee849",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Using the chain rule and summing over all $k$ entries\n",
"\n",
@@ -2716,9 +2385,7 @@
{
"cell_type": "markdown",
"id": "124cb42a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
@@ -2728,9 +2395,7 @@
{
"cell_type": "markdown",
"id": "4dedc423",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and recalling that"
]
@@ -2738,9 +2403,7 @@
{
"cell_type": "markdown",
"id": "fbead9d4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
@@ -2750,9 +2413,7 @@
{
"cell_type": "markdown",
"id": "89702b4f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with $M_l$ being the number of nodes in layer $l$, we obtain"
]
@@ -2760,9 +2421,7 @@
{
"cell_type": "markdown",
"id": "2fb650ad",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
@@ -2772,9 +2431,7 @@
{
"cell_type": "markdown",
"id": "b32a1fbe",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"This is our final equation.\n",
"\n",
@@ -2784,9 +2441,7 @@
{
"cell_type": "markdown",
"id": "5cac9518",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Setting up the back propagation algorithm\n",
"\n",
@@ -2807,9 +2462,7 @@
{
"cell_type": "markdown",
"id": "5c7f6367",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Setting up the back propagation algorithm, part 2\n",
"\n",
@@ -2819,9 +2472,7 @@
{
"cell_type": "markdown",
"id": "be6e74e5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
@@ -2831,9 +2482,7 @@
{
"cell_type": "markdown",
"id": "cea6dc43",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
]
@@ -2841,9 +2490,7 @@
{
"cell_type": "markdown",
"id": "cd81fdd7",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
@@ -2853,9 +2500,7 @@
{
"cell_type": "markdown",
"id": "a4e2076f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Setting up the Back propagation algorithm, part 3\n",
"\n",
@@ -2867,9 +2512,7 @@
{
"cell_type": "markdown",
"id": "623d338d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
@@ -2879,9 +2522,7 @@
{
"cell_type": "markdown",
"id": "9a3168d0",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
@@ -2891,9 +2532,7 @@
{
"cell_type": "markdown",
"id": "02c62e59",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"with $\\eta$ being the learning rate."
]
@@ -2901,9 +2540,7 @@
{
"cell_type": "markdown",
"id": "b029455b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Updating the gradients\n",
"\n",
@@ -2913,9 +2550,7 @@
{
"cell_type": "markdown",
"id": "ea5337b3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
@@ -2925,9 +2560,7 @@
{
"cell_type": "markdown",
"id": "b475cb8d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
]
@@ -2935,9 +2568,7 @@
{
"cell_type": "markdown",
"id": "462823b9",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
@@ -2947,9 +2578,7 @@
{
"cell_type": "markdown",
"id": "f23b093a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
@@ -2959,9 +2588,7 @@
{
"cell_type": "markdown",
"id": "8e75c1c2",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## NN code\n",
"\n",
@@ -2971,9 +2598,7 @@
{
"cell_type": "markdown",
"id": "3505a76f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Essential elements of generative models\n",
"\n",
@@ -2995,9 +2620,7 @@
{
"cell_type": "markdown",
"id": "093e6c22",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Probability model\n",
"\n",
@@ -3007,9 +2630,7 @@
{
"cell_type": "markdown",
"id": "ea632867",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(x_i,h_j;\\boldsymbol{\\Theta}) = \\frac{f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n",
@@ -3019,9 +2640,7 @@
{
"cell_type": "markdown",
"id": "706cd35f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where $f(x_i,h_j;\\boldsymbol{\\Theta})$ is a function which we assume is larger or\n",
"equal than zero and obeys all properties required for a probability\n",
@@ -3033,9 +2652,7 @@
{
"cell_type": "markdown",
"id": "3dc14b73",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}).\n",
@@ -3045,9 +2662,7 @@
{
"cell_type": "markdown",
"id": "99e4989d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Marginal and conditional probabilities\n",
"\n",
@@ -3057,9 +2672,7 @@
{
"cell_type": "markdown",
"id": "6f6aca14",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(x_i;\\boldsymbol{\\Theta}) = \\frac{\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n",
@@ -3069,9 +2682,7 @@
{
"cell_type": "markdown",
"id": "d976441f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -3079,9 +2690,7 @@
{
"cell_type": "markdown",
"id": "219b5965",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(h_i;\\boldsymbol{\\Theta}) = \\frac{\\sum_{x_i\\in \\boldsymbol{X}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})}.\n",
@@ -3091,9 +2700,7 @@
{
"cell_type": "markdown",
"id": "d407496e",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Change of notation\n",
"\n",
@@ -3106,9 +2713,7 @@
{
"cell_type": "markdown",
"id": "09154c06",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}),\n",
@@ -3118,9 +2723,7 @@
{
"cell_type": "markdown",
"id": "60583422",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"changes to"
]
@@ -3128,9 +2731,7 @@
{
"cell_type": "markdown",
"id": "09d0c1d3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"Z(\\boldsymbol{\\Theta})=\\sum_{\\boldsymbol{x}}\\sum_{\\boldsymbol{h}} f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}).\n",
@@ -3140,9 +2741,7 @@
{
"cell_type": "markdown",
"id": "10e07dcb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"If we have a binary set of variable $x_i$ and $h_j$ and $M$ values of $x_i$ and $N$ values of $h_j$ we have in total $2^M$ and $2^N$ possible $\\boldsymbol{x}$ and $\\boldsymbol{h}$ configurations, respectively.\n",
"\n",
@@ -3153,9 +2752,7 @@
{
"cell_type": "markdown",
"id": "ec807692",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Optimization problem\n",
"\n",
@@ -3165,9 +2762,7 @@
{
"cell_type": "markdown",
"id": "b5d52ab5",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\prod_{x_i\\in \\boldsymbol{X}}p(x_i;\\boldsymbol{\\Theta})=\\prod_{x_i\\in \\boldsymbol{X}}\\left(\\frac{\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})}\\right),\n",
@@ -3177,9 +2772,7 @@
{
"cell_type": "markdown",
"id": "f1c09bbf",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which we rewrite as"
]
@@ -3187,9 +2780,7 @@
{
"cell_type": "markdown",
"id": "e8145abd",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\frac{1}{Z(\\boldsymbol{\\Theta})}\\prod_{x_i\\in \\boldsymbol{X}}\\left(\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})\\right).\n",
@@ -3199,9 +2790,7 @@
{
"cell_type": "markdown",
"id": "d8a60c69",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Further simplifications\n",
"\n",
@@ -3211,9 +2800,7 @@
{
"cell_type": "markdown",
"id": "f87035b3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\frac{1}{Z(\\boldsymbol{\\Theta})}\\prod_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}),\n",
@@ -3223,9 +2810,7 @@
{
"cell_type": "markdown",
"id": "63061a84",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where we used $p(x_i;\\boldsymbol{\\Theta}) = \\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})$.\n",
"The optimization problem is then"
@@ -3234,9 +2819,7 @@
{
"cell_type": "markdown",
"id": "e90629e1",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\mathrm{arg} \\hspace{0.1cm}\\max_{\\boldsymbol{\\boldsymbol{\\Theta}}\\in {\\mathbb{R}}^{p}}} \\hspace{0.1cm}p(\\boldsymbol{X};\\boldsymbol{\\Theta}).\n",
@@ -3246,9 +2829,7 @@
{
"cell_type": "markdown",
"id": "4dfb4385",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Optimizing the logarithm instead\n",
"\n",
@@ -3260,9 +2841,7 @@
{
"cell_type": "markdown",
"id": "eae9ef99",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"{\\displaystyle \\mathrm{arg} \\hspace{0.1cm}\\max_{\\boldsymbol{\\boldsymbol{\\Theta}}\\in {\\mathbb{R}}^{p}}} \\hspace{0.1cm}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})},\n",
@@ -3272,9 +2851,7 @@
{
"cell_type": "markdown",
"id": "79e853e3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which leads to"
]
@@ -3282,9 +2859,7 @@
{
"cell_type": "markdown",
"id": "b5822deb",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=0.\n",
@@ -3294,9 +2869,7 @@
{
"cell_type": "markdown",
"id": "3da9d274",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Expression for the gradients\n",
"\n",
@@ -3306,9 +2879,7 @@
{
"cell_type": "markdown",
"id": "0c362f5f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=\\nabla_{\\boldsymbol{\\Theta}}\\left(\\sum_{x_i\\in \\boldsymbol{X}}\\log{f(x_i;\\boldsymbol{\\Theta})}\\right)-\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=0.\n",
@@ -3318,9 +2889,7 @@
{
"cell_type": "markdown",
"id": "e4fbadf8",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"The first term is called the positive phase and we assume that we have a model for the function $f$ from which we can sample values. Below we will develop an explicit model for this.\n",
"The second term is called the negative phase and is the one which leads to more difficulties."
@@ -3329,9 +2898,7 @@
{
"cell_type": "markdown",
"id": "13aad1f3",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## The derivative of the partition function\n",
"\n",
@@ -3341,9 +2908,7 @@
{
"cell_type": "markdown",
"id": "6cf15217",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}),\n",
@@ -3353,9 +2918,7 @@
{
"cell_type": "markdown",
"id": "b7f2b57b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"is in general the most problematic term. In principle both $x$ and $h$ can span large degrees of freedom, if not even infinitely many ones, and computing the partition function itself is often not desirable or even feasible. The above derivative of the partition function can however be written in terms of an expectation value which is in turn evaluated using Monte Carlo sampling and the theory of Markov chains, popularly shortened to MCMC (or just MC$^2$)."
]
@@ -3363,9 +2926,7 @@
{
"cell_type": "markdown",
"id": "84697aa7",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Explicit expression for the derivative\n",
"We can rewrite"
@@ -3374,9 +2935,7 @@
{
"cell_type": "markdown",
"id": "cff18fb0",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{\\nabla_{\\boldsymbol{\\Theta}}Z(\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n",
@@ -3386,9 +2945,7 @@
{
"cell_type": "markdown",
"id": "b200258b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which reads in more detail"
]
@@ -3396,9 +2953,7 @@
{
"cell_type": "markdown",
"id": "78c3e330",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{\\nabla_{\\boldsymbol{\\Theta}} \\sum_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}) }{Z(\\boldsymbol{\\Theta})}.\n",
@@ -3408,9 +2963,7 @@
{
"cell_type": "markdown",
"id": "5c039452",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"We can rewrite the function $f$ (we have assumed that is larger or\n",
"equal than zero) as $f=\\exp{\\log{f}}$. We can then rewrite the last\n",
@@ -3420,9 +2973,7 @@
{
"cell_type": "markdown",
"id": "b1450211",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{ \\sum_{x_i\\in \\boldsymbol{X}} \\nabla_{\\boldsymbol{\\Theta}}\\exp{\\log{f(x_i;\\boldsymbol{\\Theta})}} }{Z(\\boldsymbol{\\Theta})}.\n",
@@ -3432,9 +2983,7 @@
{
"cell_type": "markdown",
"id": "dbf02775",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Final expression\n",
"\n",
@@ -3444,9 +2993,7 @@
{
"cell_type": "markdown",
"id": "aee826e9",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{ \\sum_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}) \\nabla_{\\boldsymbol{\\Theta}}\\log{f(x_i;\\boldsymbol{\\Theta})} }{Z(\\boldsymbol{\\Theta})},\n",
@@ -3456,9 +3003,7 @@
{
"cell_type": "markdown",
"id": "7c21e99c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"which is the expectation value of $\\log{f}$"
]
@@ -3466,9 +3011,7 @@
{
"cell_type": "markdown",
"id": "5614ea79",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\sum_{x_i\\sim p}p(x_i;\\boldsymbol{\\Theta}) \\nabla_{\\boldsymbol{\\Theta}}\\log{f(x_i;\\boldsymbol{\\Theta})},\n",
@@ -3478,9 +3021,7 @@
{
"cell_type": "markdown",
"id": "9c011fbe",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"that is"
]
@@ -3488,9 +3029,7 @@
{
"cell_type": "markdown",
"id": "47c4916b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\mathbb{E}(\\log{f(x_i;\\boldsymbol{\\Theta})}).\n",
@@ -3500,9 +3039,7 @@
{
"cell_type": "markdown",
"id": "c6519f09",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"This quantity is evaluated using Monte Carlo sampling, with Gibbs\n",
"sampling as the standard sampling rule."
@@ -3511,9 +3048,7 @@
{
"cell_type": "markdown",
"id": "e82bc48f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Final expression for the gradients\n",
"\n",
@@ -3523,9 +3058,7 @@
{
"cell_type": "markdown",
"id": "0a42012d",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=\\nabla_{\\boldsymbol{\\Theta}}\\left(\\sum_{x_i\\in \\boldsymbol{X}}\\log{f(x_i;\\boldsymbol{\\Theta})}\\right)-\\mathbb{E}_{x\\sim p}(\\log{f(x_i;\\boldsymbol{\\Theta})})=0.\n",
@@ -3535,9 +3068,7 @@
{
"cell_type": "markdown",
"id": "b1129c95",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Introducing the energy model\n",
"\n",
@@ -3547,9 +3078,7 @@
{
"cell_type": "markdown",
"id": "4fb7028c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}) = \\frac{f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n",
@@ -3559,9 +3088,7 @@
{
"cell_type": "markdown",
"id": "4a25c045",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where $f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})$ is given by a so-called energy model. If we assume that the random variables $x_i$ and $h_j$ take binary values only, for example $x_i,h_j=\\{0,1\\}$, we have a so-called binary-binary model where"
]
@@ -3569,9 +3096,7 @@
{
"cell_type": "markdown",
"id": "64f27aa4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})=-E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta}) = \\sum_{x_i\\in \\boldsymbol{X}} x_i a_i+\\sum_{h_j\\in \\boldsymbol{H}} b_j h_j + \\sum_{x_i\\in \\boldsymbol{X},h_j\\in\\boldsymbol{H}} x_i w_{ij} h_j,\n",
@@ -3581,9 +3106,7 @@
{
"cell_type": "markdown",
"id": "04553f9f",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where the set of parameters are given by the biases and weights $\\boldsymbol{\\Theta}=\\{\\boldsymbol{a},\\boldsymbol{b},\\boldsymbol{W}\\}$.\n",
"**Note the vector notation** instead of $x_i$ and $h_j$ for $f$. The vectors $\\boldsymbol{x}$ and $\\boldsymbol{h}$ represent a specific instance of stochastic variables $x_i$ and $h_j$. These arrangements of $\\boldsymbol{x}$ and $\\boldsymbol{h}$ lead to a specific energy configuration."
@@ -3592,9 +3115,7 @@
{
"cell_type": "markdown",
"id": "15f0ce3a",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## More compact notation\n",
"\n",
@@ -3604,9 +3125,7 @@
{
"cell_type": "markdown",
"id": "b2a97150",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"p(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}) = \\frac{\\exp{(\\boldsymbol{a}^T\\boldsymbol{x}+\\boldsymbol{b}^T\\boldsymbol{h}+\\boldsymbol{x}^T\\boldsymbol{W}\\boldsymbol{h})}}{Z(\\boldsymbol{\\Theta})},\n",
@@ -3616,9 +3135,7 @@
{
"cell_type": "markdown",
"id": "9c794d16",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where the biases $\\boldsymbol{a}$ and $\\boldsymbol{h}$ and the weights defined by the matrix $\\boldsymbol{W}$ are the parameters we need to optimize."
]
@@ -3626,9 +3143,7 @@
{
"cell_type": "markdown",
"id": "a6986c04",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Examples of gradient expressions\n",
"\n",
@@ -3641,9 +3156,7 @@
{
"cell_type": "markdown",
"id": "de147884",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial w_{ij}}=-x_ih_j,\n",
@@ -3653,9 +3166,7 @@
{
"cell_type": "markdown",
"id": "bd7c2f00",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -3663,9 +3174,7 @@
{
"cell_type": "markdown",
"id": "5b2b56a2",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial a_i}=-x_i,\n",
@@ -3675,9 +3184,7 @@
{
"cell_type": "markdown",
"id": "937d72bc",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"and"
]
@@ -3685,9 +3192,7 @@
{
"cell_type": "markdown",
"id": "cbec00b4",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial b_j}=-h_j.\n",
@@ -3697,9 +3202,7 @@
{
"cell_type": "markdown",
"id": "757dfd72",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Network Elements, the energy function\n",
"\n",
@@ -3714,9 +3217,7 @@
{
"cell_type": "markdown",
"id": "3ee3e71c",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Defining different types of RBMs\n",
"\n",
@@ -3730,9 +3231,7 @@
{
"cell_type": "markdown",
"id": "91c70e4b",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\begin{align*}\n",
@@ -3744,9 +3243,7 @@
{
"cell_type": "markdown",
"id": "a7073293",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"where the binary values taken on by the nodes are most commonly 0 and 1."
]
@@ -3754,9 +3251,7 @@
{
"cell_type": "markdown",
"id": "772217ef",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Gaussian-binary RBM\n",
"\n",
@@ -3766,9 +3261,7 @@
{
"cell_type": "markdown",
"id": "a1488d82",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"$$\n",
"\\begin{align*}\n",
@@ -3780,9 +3273,7 @@
{
"cell_type": "markdown",
"id": "78aa59cf",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"This type of RBMs are useful when we model continuous data (i.e., we wish $\\boldsymbol{x}$ to be continuous). The paramater $\\sigma_i^2$ is meant to represent a variance and is foten just set to one."
]
@@ -3790,9 +3281,7 @@
{
"cell_type": "markdown",
"id": "50b290c2",
- "metadata": {
- "editable": true
- },
+ "metadata": {},
"source": [
"## Code for RBMs using PyTorch"
]
@@ -3801,11 +3290,129 @@
"cell_type": "code",
"execution_count": 3,
"id": "c085169e",
- "metadata": {
- "collapsed": false,
- "editable": true
- },
- "outputs": [],
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'dlopen(/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/torchvision/image.so, 0x0006): Symbol not found: __ZN3c1017RegisterOperatorsD1Ev\n",
+ " Referenced from: <2D1B8D5C-7891-3680-9CF9-F771AE880676> /Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/torchvision/image.so\n",
+ " Expected in: /Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?\n",
+ " warn(\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n",
+ "Failed to download (trying next):\n",
+ "HTTP Error 403: Forbidden\n",
+ "\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:02<00:00, 4337161.71it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw\n",
+ "\n",
+ "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\n",
+ "Failed to download (trying next):\n",
+ "HTTP Error 403: Forbidden\n",
+ "\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|███████████████████████████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 131135.28it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw\n",
+ "\n",
+ "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\n",
+ "Failed to download (trying next):\n",
+ "HTTP Error 403: Forbidden\n",
+ "\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████████████████████████████████████████████████████████████████████████████████████| 1648877/1648877 [00:01<00:00, 1168573.50it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw\n",
+ "\n",
+ "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\n",
+ "Failed to download (trying next):\n",
+ "HTTP Error 403: Forbidden\n",
+ "\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz\n",
+ "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 1005995.08it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw\n",
+ "\n",
+ "Training loss for 0 epoch: -8.471399307250977\n",
+ "Training loss for 1 epoch: -6.6942901611328125\n",
+ "Training loss for 2 epoch: -4.579244136810303\n",
+ "Training loss for 3 epoch: -3.2101542949676514\n",
+ "Training loss for 4 epoch: -2.2072556018829346\n",
+ "Training loss for 5 epoch: -1.5162527561187744\n",
+ "Training loss for 6 epoch: -1.025678277015686\n",
+ "Training loss for 7 epoch: -0.7097488045692444\n",
+ "Training loss for 8 epoch: -0.43329933285713196\n",
+ "Training loss for 9 epoch: -0.2937687039375305\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "