update

CompPhysics · Jan 30, 2024 · d44582d · d44582d
1 parent 880245c
commit d44582d
Show file tree

Hide file tree

Showing 8 changed files with 705 additions and 290 deletions.
diff --git a/doc/pub/week3/html/week3-bs.html b/doc/pub/week3/html/week3-bs.html
@@ -166,10 +166,13 @@
                2,
                None,
                'optimizing-the-parameters'),
-              ('First back propagation equation',
+              ('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
+              ('The derivatives', 2, None, 'the-derivatives'),
+              ('Getting serious, the  back propagation equations for a neural '
+               'network',
                2,
                None,
-               'first-back-propagation-equation'),
+               'getting-serious-the-back-propagation-equations-for-a-neural-network'),
               ('Analyzing the last results',
                2,
                None,
@@ -357,7 +360,9 @@
      <!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;">Chain rule</a></li>
      <!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;">First network example, simple percepetron with one input</a></li>
      <!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;">Optimizing the parameters</a></li>
-     <!-- navigation toc: --> <li><a href="#first-back-propagation-equation" style="font-size: 80%;">First back propagation equation</a></li>
+     <!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;">Adding a hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;">The derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;">Getting serious, the  back propagation equations for a neural network</a></li>
      <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;">Analyzing the last results</a></li>
      <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;">More considerations</a></li>
      <!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;">Derivatives in terms of \( z_j^L \)</a></li>
@@ -979,7 +984,7 @@ <h2 id="forward-and-reverse-modes" class="anchor">Forward and reverse modes </h2
 since we start by evaluating the derivatives at the end point and then
 propagate backwards. This is the standard way of evaluating
 derivatives (gradients) when optimizing the parameters of a neural
-network).  In the context of deep learning this is computationally
+network.  In the context of deep learning this is computationally
 more efficient since the output of a neural network consists of either
 one or some few other output variables.
 </p>
@@ -1149,7 +1154,7 @@ <h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2
 define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
 </p>
 
-<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
+<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
 
 <!-- !split -->
 <h2 id="chain-rule" class="anchor">Chain rule </h2>
@@ -1224,7 +1229,55 @@ <h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2
 
 
 <!-- !split -->
-<h2 id="first-back-propagation-equation" class="anchor">First back propagation equation </h2>
+<h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>
+
+<p>We change our simple model to a (see graph on whiteboard notes)
+network with on hidden layer but still with scalar variables only.
+</p>
+
+<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
+We have then
+</p>
+$$
+z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
+$$
+
+$$
+z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
+$$
+
+<p>and the cost function</p>
+$$
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+$$
+
+<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
+
+<!-- !split -->
+<h2 id="the-derivatives" class="anchor">The derivatives </h2>
+
+<p>The derivatives are now, using the chain rule again</p>
+
+$$
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+$$
+
+$$
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+$$
+
+$$
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
+$$
+
+$$
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
+$$
+
+<p>Can you generalize this to more than one hidden layer?</p>
+
+<!-- !split -->
+<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network" class="anchor">Getting serious, the  back propagation equations for a neural network </h2>
 
 <p>We have thus</p>
 $$

diff --git a/doc/pub/week3/html/week3-reveal.html b/doc/pub/week3/html/week3-reveal.html
@@ -819,7 +819,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
 since we start by evaluating the derivatives at the end point and then
 propagate backwards. This is the standard way of evaluating
 derivatives (gradients) when optimizing the parameters of a neural
-network).  In the context of deep learning this is computationally
+network.  In the context of deep learning this is computationally
 more efficient since the output of a neural network consists of either
 one or some few other output variables.
 </p>
@@ -1028,7 +1028,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
 define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
 </p>
 
-<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
+<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
 </section>
 
 <section>
@@ -1121,7 +1121,71 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
 </section>
 
 <section>
-<h2 id="first-back-propagation-equation">First back propagation equation </h2>
+<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>
+
+<p>We change our simple model to a (see graph on whiteboard notes)
+network with on hidden layer but still with scalar variables only.
+</p>
+
+<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
+We have then
+</p>
+<p>&nbsp;<br>
+$$
+z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
+$$
+<p>&nbsp;<br>
+
+<p>&nbsp;<br>
+$$
+z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
+$$
+<p>&nbsp;<br>
+
+<p>and the cost function</p>
+<p>&nbsp;<br>
+$$
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+$$
+<p>&nbsp;<br>
+
+<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
+</section>
+
+<section>
+<h2 id="the-derivatives">The derivatives </h2>
+
+<p>The derivatives are now, using the chain rule again</p>
+
+<p>&nbsp;<br>
+$$
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+$$
+<p>&nbsp;<br>
+
+<p>&nbsp;<br>
+$$
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+$$
+<p>&nbsp;<br>
+
+<p>&nbsp;<br>
+$$
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
+$$
+<p>&nbsp;<br>
+
+<p>&nbsp;<br>
+$$
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
+$$
+<p>&nbsp;<br>
+
+<p>Can you generalize this to more than one hidden layer?</p>
+</section>
+
+<section>
+<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network </h2>
 
 <p>We have thus</p>
 <p>&nbsp;<br>

diff --git a/doc/pub/week3/html/week3-solarized.html b/doc/pub/week3/html/week3-solarized.html
@@ -193,10 +193,13 @@
                2,
                None,
                'optimizing-the-parameters'),
-              ('First back propagation equation',
+              ('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
+              ('The derivatives', 2, None, 'the-derivatives'),
+              ('Getting serious, the  back propagation equations for a neural '
+               'network',
                2,
                None,
-               'first-back-propagation-equation'),
+               'getting-serious-the-back-propagation-equations-for-a-neural-network'),
               ('Analyzing the last results',
                2,
                None,
@@ -890,7 +893,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
 since we start by evaluating the derivatives at the end point and then
 propagate backwards. This is the standard way of evaluating
 derivatives (gradients) when optimizing the parameters of a neural
-network).  In the context of deep learning this is computationally
+network.  In the context of deep learning this is computationally
 more efficient since the output of a neural network consists of either
 one or some few other output variables.
 </p>
@@ -1060,7 +1063,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
 define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
 </p>
 
-<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
+<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="chain-rule">Chain rule </h2>
@@ -1135,7 +1138,55 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="first-back-propagation-equation">First back propagation equation </h2>
+<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>
+
+<p>We change our simple model to a (see graph on whiteboard notes)
+network with on hidden layer but still with scalar variables only.
+</p>
+
+<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
+We have then
+</p>
+$$
+z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
+$$
+
+$$
+z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
+$$
+
+<p>and the cost function</p>
+$$
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+$$
+
+<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-derivatives">The derivatives </h2>
+
+<p>The derivatives are now, using the chain rule again</p>
+
+$$
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+$$
+
+$$
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+$$
+
+$$
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
+$$
+
+$$
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
+$$
+
+<p>Can you generalize this to more than one hidden layer?</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network </h2>
 
 <p>We have thus</p>
 $$

diff --git a/doc/pub/week3/html/week3.html b/doc/pub/week3/html/week3.html
@@ -270,10 +270,13 @@
                2,
                None,
                'optimizing-the-parameters'),
-              ('First back propagation equation',
+              ('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
+              ('The derivatives', 2, None, 'the-derivatives'),
+              ('Getting serious, the  back propagation equations for a neural '
+               'network',
                2,
                None,
-               'first-back-propagation-equation'),
+               'getting-serious-the-back-propagation-equations-for-a-neural-network'),
               ('Analyzing the last results',
                2,
                None,
@@ -967,7 +970,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
 since we start by evaluating the derivatives at the end point and then
 propagate backwards. This is the standard way of evaluating
 derivatives (gradients) when optimizing the parameters of a neural
-network).  In the context of deep learning this is computationally
+network.  In the context of deep learning this is computationally
 more efficient since the output of a neural network consists of either
 one or some few other output variables.
 </p>
@@ -1137,7 +1140,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
 define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
 </p>
 
-<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
+<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="chain-rule">Chain rule </h2>
@@ -1212,7 +1215,55 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="first-back-propagation-equation">First back propagation equation </h2>
+<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>
+
+<p>We change our simple model to a (see graph on whiteboard notes)
+network with on hidden layer but still with scalar variables only.
+</p>
+
+<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
+We have then
+</p>
+$$
+z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
+$$
+
+$$
+z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
+$$
+
+<p>and the cost function</p>
+$$
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+$$
+
+<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-derivatives">The derivatives </h2>
+
+<p>The derivatives are now, using the chain rule again</p>
+
+$$
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+$$
+
+$$
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+$$
+
+$$
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
+$$
+
+$$
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
+$$
+
+<p>Can you generalize this to more than one hidden layer?</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network </h2>
 
 <p>We have thus</p>
 $$

diff --git a/doc/pub/week3/ipynb/ipynb-week3-src.tar.gz b/doc/pub/week3/ipynb/ipynb-week3-src.tar.gz