update

CompPhysics · Feb 2, 2024 · a0aa01c · a0aa01c
1 parent 10136f2
commit a0aa01c
Show file tree

Hide file tree

Showing 10 changed files with 5,461 additions and 323 deletions.
diff --git a/doc/pub/week3/html/week3-bs.html b/doc/pub/week3/html/week3-bs.html
@@ -153,6 +153,10 @@
                None,
                'new-expression-for-the-derivative'),
               ('Final derivatives', 2, None, 'final-derivatives'),
+              ('In general not this simple',
+               2,
+               None,
+               'in-general-not-this-simple'),
               ('Automatic differentiation',
                2,
                None,
@@ -170,6 +174,22 @@
               ('The derivatives', 2, None, 'the-derivatives'),
               ('Important observations', 2, None, 'important-observations'),
               ('The training', 2, None, 'the-training'),
+              ('Code examples for the simple models',
+               2,
+               None,
+               'code-examples-for-the-simple-models'),
+              ('Simple neural network and the  back propagation equations',
+               2,
+               None,
+               'simple-neural-network-and-the-back-propagation-equations'),
+              ('The ouput layer', 2, None, 'the-ouput-layer'),
+              ('Compact expressions', 2, None, 'compact-expressions'),
+              ('For the output layer', 2, None, 'for-the-output-layer'),
+              ('Explicit derivatives', 2, None, 'explicit-derivatives'),
+              ('Setting up the equations for the optimization',
+               2,
+               None,
+               'setting-up-the-equations-for-the-optimization'),
               ('Getting serious, the  back propagation equations for a neural '
                'network',
                2,
@@ -358,6 +378,7 @@
      <!-- navigation toc: --> <li><a href="#defining-intermediate-operations" style="font-size: 80%;">Defining intermediate operations</a></li>
      <!-- navigation toc: --> <li><a href="#new-expression-for-the-derivative" style="font-size: 80%;">New expression for the derivative</a></li>
      <!-- navigation toc: --> <li><a href="#final-derivatives" style="font-size: 80%;">Final derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#in-general-not-this-simple" style="font-size: 80%;">In general not this simple</a></li>
      <!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
      <!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;">Chain rule</a></li>
      <!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;">First network example, simple percepetron with one input</a></li>
@@ -366,6 +387,13 @@
      <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;">The derivatives</a></li>
      <!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;">Important observations</a></li>
      <!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;">The training</a></li>
+     <!-- navigation toc: --> <li><a href="#code-examples-for-the-simple-models" style="font-size: 80%;">Code examples for the simple models</a></li>
+     <!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;">Simple neural network and the  back propagation equations</a></li>
+     <!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;">The ouput layer</a></li>
+     <!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;">Compact expressions</a></li>
+     <!-- navigation toc: --> <li><a href="#for-the-output-layer" style="font-size: 80%;">For the output layer</a></li>
+     <!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;">Explicit derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-the-optimization" style="font-size: 80%;">Setting up the equations for the optimization</a></li>
      <!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;">Getting serious, the  back propagation equations for a neural network</a></li>
      <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;">Analyzing the last results</a></li>
      <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;">More considerations</a></li>
@@ -1141,12 +1169,21 @@ <h2 id="final-derivatives" class="anchor">Final derivatives </h2>
 
 <p>and requires only three operations if we can reuse all intermediate variables.</p>
 
+<!-- !split -->
+<h2 id="in-general-not-this-simple" class="anchor">In general not this simple </h2>
+
+<p>In general, see the generalization below, unless we can obtain simple
+analytical expressions which we can simplify further, the final
+implementation of automatic differentiation involves repeated
+calculations (and thereby operations) of derivatives of elementary
+functions.
+</p>
+
 <!-- !split -->
 <h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2>
 
 <p>We can make this example more formal. Automatic differentiation is a
-formalization of the previous example (see graph from whiteboard
-notes).
+formalization of the previous example (see graph).
 </p>
 
 <p>We define \( \boldsymbol{x}\in x_1,\dots, x_l \) input variables to a given function \( f(\boldsymbol{x}) \) and \( x_{l+1},\dots, x_L \) intermediate variables.</p>
@@ -1188,7 +1225,12 @@ <h2 id="chain-rule" class="anchor">Chain rule </h2>
 <!-- !split -->
 <h2 id="first-network-example-simple-percepetron-with-one-input" class="anchor">First network example, simple percepetron with one input </h2>
 
-<p>As yet another example we define now a simple perceptron model with all quantities given by scalars. We consider only one input variable \( x \) and one target value \( y \).  We define an activation function \( \sigma_1 \) which takes as input</p>
+<p>As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+\( x \) and one target value \( y \).  We define an activation function
+\( \sigma_1 \) which takes as input
+</p>
+
 $$
 z_1 = w_1x+b_1,
 $$
@@ -1237,7 +1279,7 @@ <h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2
 <!-- !split -->
 <h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>
 
-<p>We change our simple model to a (see graph on whiteboard notes)
+<p>We change our simple model to (see graph)
 network with just one hidden layer but with scalar variables only.
 </p>
 
@@ -1304,7 +1346,7 @@ <h2 id="the-training" class="anchor">The training </h2>
 <p>The training of the parameters is done through various gradient descent approximations with</p>
 
 $$
-w_{i}\leftarrow  = w_{i}- \eta \delta_i a_{i-1},
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
 $$
 
 <p>and</p>
@@ -1318,6 +1360,88 @@ <h2 id="the-training" class="anchor">The training </h2>
 
 <p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>
 
+<!-- !split -->
+<h2 id="code-examples-for-the-simple-models" class="anchor">Code examples for the simple models </h2>
+
+<!-- !split -->
+<h2 id="simple-neural-network-and-the-back-propagation-equations" class="anchor">Simple neural network and the  back propagation equations  </h2>
+
+<p>Let us now try to increase our level of ambitions and attempt to set
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one utput layers.
+</p>
+
+<p>We need to define the following parameters and variables with the input layer (layer \( (0) \)) 
+where we label the  nodes \( x_0 \) and \( x_1 \)
+</p>
+$$
+x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
+$$
+
+<p>The  hidden layer (layer \( (1) \)) has  nodes which yield the outputs \( a_0^{(1)} \) and \( a_1^{(1)} \)) with  weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
+$$
+w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
+$$
+
+
+<!-- !split -->
+<h2 id="the-ouput-layer" class="anchor">The ouput layer </h2>
+
+<p>Finally, we have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined </p>
+$$
+w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
+$$
+
+<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
+The parameters we need to optimize are given by
+</p>
+$$
+\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
+$$
+
+
+<!-- !split -->
+<h2 id="compact-expressions" class="anchor">Compact expressions </h2>
+
+<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are
+</p>
+$$
+\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
+$$
+
+<p>with outputs</p>
+$$
+\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
+$$
+
+
+<!-- !split -->
+<h2 id="for-the-output-layer" class="anchor">For the output layer </h2>
+
+<p>For the final output layer we have the inputs to the final activation function </p>
+$$
+z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
+$$
+
+<p>resulting in the  output</p>
+$$
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+$$
+
+
+<!-- !split -->
+<h2 id="explicit-derivatives" class="anchor">Explicit derivatives </h2>
+
+<p>In total we have nine parameters which we need to train.
+Using the chain rule (or just the back propagation algorithm) we can find all derivatives. For the output layer we have then
+</p>
+
+<!-- !split -->
+<h2 id="setting-up-the-equations-for-the-optimization" class="anchor">Setting up the equations for the optimization </h2>
+
+<p>For th</p>
+
 <!-- !split -->
 <h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network" class="anchor">Getting serious, the  back propagation equations for a neural network </h2>
 

diff --git a/doc/pub/week3/html/week3-reveal.html b/doc/pub/week3/html/week3-reveal.html
@@ -1009,12 +1009,22 @@ <h2 id="final-derivatives">Final derivatives </h2>
 <p>and requires only three operations if we can reuse all intermediate variables.</p>
 </section>
 
+<section>
+<h2 id="in-general-not-this-simple">In general not this simple </h2>
+
+<p>In general, see the generalization below, unless we can obtain simple
+analytical expressions which we can simplify further, the final
+implementation of automatic differentiation involves repeated
+calculations (and thereby operations) of derivatives of elementary
+functions.
+</p>
+</section>
+
 <section>
 <h2 id="automatic-differentiation">Automatic differentiation </h2>
 
 <p>We can make this example more formal. Automatic differentiation is a
-formalization of the previous example (see graph from whiteboard
-notes).
+formalization of the previous example (see graph).
 </p>
 
 <p>We define \( \boldsymbol{x}\in x_1,\dots, x_l \) input variables to a given function \( f(\boldsymbol{x}) \) and \( x_{l+1},\dots, x_L \) intermediate variables.</p>
@@ -1064,7 +1074,12 @@ <h2 id="chain-rule">Chain rule </h2>
 <section>
 <h2 id="first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input </h2>
 
-<p>As yet another example we define now a simple perceptron model with all quantities given by scalars. We consider only one input variable \( x \) and one target value \( y \).  We define an activation function \( \sigma_1 \) which takes as input</p>
+<p>As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+\( x \) and one target value \( y \).  We define an activation function
+\( \sigma_1 \) which takes as input
+</p>
+
 <p>&nbsp;<br>
 $$
 z_1 = w_1x+b_1,
@@ -1125,7 +1140,7 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
 <section>
 <h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>
 
-<p>We change our simple model to a (see graph on whiteboard notes)
+<p>We change our simple model to (see graph)
 network with just one hidden layer but with scalar variables only.
 </p>
 
@@ -1208,7 +1223,7 @@ <h2 id="the-training">The training </h2>
 
 <p>&nbsp;<br>
 $$
-w_{i}\leftarrow  = w_{i}- \eta \delta_i a_{i-1},
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
 $$
 <p>&nbsp;<br>
 
@@ -1226,6 +1241,107 @@ <h2 id="the-training">The training </h2>
 <p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>
 </section>
 
+<section>
+<h2 id="code-examples-for-the-simple-models">Code examples for the simple models </h2>
+</section>
+
+<section>
+<h2 id="simple-neural-network-and-the-back-propagation-equations">Simple neural network and the  back propagation equations  </h2>
+
+<p>Let us now try to increase our level of ambitions and attempt to set
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one utput layers.
+</p>
+
+<p>We need to define the following parameters and variables with the input layer (layer \( (0) \)) 
+where we label the  nodes \( x_0 \) and \( x_1 \)
+</p>
+<p>&nbsp;<br>
+$$
+x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
+$$
+<p>&nbsp;<br>
+
+<p>The  hidden layer (layer \( (1) \)) has  nodes which yield the outputs \( a_0^{(1)} \) and \( a_1^{(1)} \)) with  weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
+<p>&nbsp;<br>
+$$
+w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="the-ouput-layer">The ouput layer </h2>
+
+<p>Finally, we have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined </p>
+<p>&nbsp;<br>
+$$
+w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
+$$
+<p>&nbsp;<br>
+
+<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
+The parameters we need to optimize are given by
+</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="compact-expressions">Compact expressions </h2>
+
+<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are
+</p>
+<p>&nbsp;<br>
+$$
+\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
+$$
+<p>&nbsp;<br>
+
+<p>with outputs</p>
+<p>&nbsp;<br>
+$$
+\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="for-the-output-layer">For the output layer </h2>
+
+<p>For the final output layer we have the inputs to the final activation function </p>
+<p>&nbsp;<br>
+$$
+z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
+$$
+<p>&nbsp;<br>
+
+<p>resulting in the  output</p>
+<p>&nbsp;<br>
+$$
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="explicit-derivatives">Explicit derivatives </h2>
+
+<p>In total we have nine parameters which we need to train.
+Using the chain rule (or just the back propagation algorithm) we can find all derivatives. For the output layer we have then
+</p>
+</section>
+
+<section>
+<h2 id="setting-up-the-equations-for-the-optimization">Setting up the equations for the optimization </h2>
+
+<p>For th</p>
+</section>
+
 <section>
 <h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network </h2>