Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
mhjensen committed Feb 2, 2024
1 parent 10136f2 commit a0aa01c
Show file tree
Hide file tree
Showing 10 changed files with 5,461 additions and 323 deletions.
134 changes: 129 additions & 5 deletions doc/pub/week3/html/week3-bs.html
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,10 @@
None,
'new-expression-for-the-derivative'),
('Final derivatives', 2, None, 'final-derivatives'),
('In general not this simple',
2,
None,
'in-general-not-this-simple'),
('Automatic differentiation',
2,
None,
Expand All @@ -170,6 +174,22 @@
('The derivatives', 2, None, 'the-derivatives'),
('Important observations', 2, None, 'important-observations'),
('The training', 2, None, 'the-training'),
('Code examples for the simple models',
2,
None,
'code-examples-for-the-simple-models'),
('Simple neural network and the back propagation equations',
2,
None,
'simple-neural-network-and-the-back-propagation-equations'),
('The ouput layer', 2, None, 'the-ouput-layer'),
('Compact expressions', 2, None, 'compact-expressions'),
('For the output layer', 2, None, 'for-the-output-layer'),
('Explicit derivatives', 2, None, 'explicit-derivatives'),
('Setting up the equations for the optimization',
2,
None,
'setting-up-the-equations-for-the-optimization'),
('Getting serious, the back propagation equations for a neural '
'network',
2,
Expand Down Expand Up @@ -358,6 +378,7 @@
<!-- navigation toc: --> <li><a href="#defining-intermediate-operations" style="font-size: 80%;">Defining intermediate operations</a></li>
<!-- navigation toc: --> <li><a href="#new-expression-for-the-derivative" style="font-size: 80%;">New expression for the derivative</a></li>
<!-- navigation toc: --> <li><a href="#final-derivatives" style="font-size: 80%;">Final derivatives</a></li>
<!-- navigation toc: --> <li><a href="#in-general-not-this-simple" style="font-size: 80%;">In general not this simple</a></li>
<!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
<!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;">Chain rule</a></li>
<!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;">First network example, simple percepetron with one input</a></li>
Expand All @@ -366,6 +387,13 @@
<!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;">The derivatives</a></li>
<!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;">Important observations</a></li>
<!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;">The training</a></li>
<!-- navigation toc: --> <li><a href="#code-examples-for-the-simple-models" style="font-size: 80%;">Code examples for the simple models</a></li>
<!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;">Simple neural network and the back propagation equations</a></li>
<!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;">The ouput layer</a></li>
<!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;">Compact expressions</a></li>
<!-- navigation toc: --> <li><a href="#for-the-output-layer" style="font-size: 80%;">For the output layer</a></li>
<!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;">Explicit derivatives</a></li>
<!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-the-optimization" style="font-size: 80%;">Setting up the equations for the optimization</a></li>
<!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;">Getting serious, the back propagation equations for a neural network</a></li>
<!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;">Analyzing the last results</a></li>
<!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;">More considerations</a></li>
Expand Down Expand Up @@ -1141,12 +1169,21 @@ <h2 id="final-derivatives" class="anchor">Final derivatives </h2>

<p>and requires only three operations if we can reuse all intermediate variables.</p>

<!-- !split -->
<h2 id="in-general-not-this-simple" class="anchor">In general not this simple </h2>

<p>In general, see the generalization below, unless we can obtain simple
analytical expressions which we can simplify further, the final
implementation of automatic differentiation involves repeated
calculations (and thereby operations) of derivatives of elementary
functions.
</p>

<!-- !split -->
<h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2>

<p>We can make this example more formal. Automatic differentiation is a
formalization of the previous example (see graph from whiteboard
notes).
formalization of the previous example (see graph).
</p>

<p>We define \( \boldsymbol{x}\in x_1,\dots, x_l \) input variables to a given function \( f(\boldsymbol{x}) \) and \( x_{l+1},\dots, x_L \) intermediate variables.</p>
Expand Down Expand Up @@ -1188,7 +1225,12 @@ <h2 id="chain-rule" class="anchor">Chain rule </h2>
<!-- !split -->
<h2 id="first-network-example-simple-percepetron-with-one-input" class="anchor">First network example, simple percepetron with one input </h2>

<p>As yet another example we define now a simple perceptron model with all quantities given by scalars. We consider only one input variable \( x \) and one target value \( y \). We define an activation function \( \sigma_1 \) which takes as input</p>
<p>As yet another example we define now a simple perceptron model with
all quantities given by scalars. We consider only one input variable
\( x \) and one target value \( y \). We define an activation function
\( \sigma_1 \) which takes as input
</p>

$$
z_1 = w_1x+b_1,
$$
Expand Down Expand Up @@ -1237,7 +1279,7 @@ <h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2
<!-- !split -->
<h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
<p>We change our simple model to (see graph)
network with just one hidden layer but with scalar variables only.
</p>

Expand Down Expand Up @@ -1304,7 +1346,7 @@ <h2 id="the-training" class="anchor">The training </h2>
<p>The training of the parameters is done through various gradient descent approximations with</p>

$$
w_{i}\leftarrow = w_{i}- \eta \delta_i a_{i-1},
w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
$$

<p>and</p>
Expand All @@ -1318,6 +1360,88 @@ <h2 id="the-training" class="anchor">The training </h2>

<p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>

<!-- !split -->
<h2 id="code-examples-for-the-simple-models" class="anchor">Code examples for the simple models </h2>

<!-- !split -->
<h2 id="simple-neural-network-and-the-back-propagation-equations" class="anchor">Simple neural network and the back propagation equations </h2>

<p>Let us now try to increase our level of ambitions and attempt to set
up the equations for a neural network with two input nodes, one hidden
layer with two hidden nodes and one utput layers.
</p>

<p>We need to define the following parameters and variables with the input layer (layer \( (0) \))
where we label the nodes \( x_0 \) and \( x_1 \)
</p>
$$
x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
$$

<p>The hidden layer (layer \( (1) \)) has nodes which yield the outputs \( a_0^{(1)} \) and \( a_1^{(1)} \)) with weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
$$
w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
$$


<!-- !split -->
<h2 id="the-ouput-layer" class="anchor">The ouput layer </h2>

<p>Finally, we have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined </p>
$$
w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
$$

<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
The parameters we need to optimize are given by
</p>
$$
\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
$$


<!-- !split -->
<h2 id="compact-expressions" class="anchor">Compact expressions </h2>

<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
The inputs to the first hidden layer are
</p>
$$
\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
$$

<p>with outputs</p>
$$
\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
$$


<!-- !split -->
<h2 id="for-the-output-layer" class="anchor">For the output layer </h2>

<p>For the final output layer we have the inputs to the final activation function </p>
$$
z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
$$

<p>resulting in the output</p>
$$
a^{(2)}=\sigma^{(2)}(z^{(2)}).
$$


<!-- !split -->
<h2 id="explicit-derivatives" class="anchor">Explicit derivatives </h2>

<p>In total we have nine parameters which we need to train.
Using the chain rule (or just the back propagation algorithm) we can find all derivatives. For the output layer we have then
</p>

<!-- !split -->
<h2 id="setting-up-the-equations-for-the-optimization" class="anchor">Setting up the equations for the optimization </h2>

<p>For th</p>

<!-- !split -->
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network" class="anchor">Getting serious, the back propagation equations for a neural network </h2>

Expand Down
126 changes: 121 additions & 5 deletions doc/pub/week3/html/week3-reveal.html
Original file line number Diff line number Diff line change
Expand Up @@ -1009,12 +1009,22 @@ <h2 id="final-derivatives">Final derivatives </h2>
<p>and requires only three operations if we can reuse all intermediate variables.</p>
</section>

<section>
<h2 id="in-general-not-this-simple">In general not this simple </h2>

<p>In general, see the generalization below, unless we can obtain simple
analytical expressions which we can simplify further, the final
implementation of automatic differentiation involves repeated
calculations (and thereby operations) of derivatives of elementary
functions.
</p>
</section>

<section>
<h2 id="automatic-differentiation">Automatic differentiation </h2>

<p>We can make this example more formal. Automatic differentiation is a
formalization of the previous example (see graph from whiteboard
notes).
formalization of the previous example (see graph).
</p>

<p>We define \( \boldsymbol{x}\in x_1,\dots, x_l \) input variables to a given function \( f(\boldsymbol{x}) \) and \( x_{l+1},\dots, x_L \) intermediate variables.</p>
Expand Down Expand Up @@ -1064,7 +1074,12 @@ <h2 id="chain-rule">Chain rule </h2>
<section>
<h2 id="first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input </h2>

<p>As yet another example we define now a simple perceptron model with all quantities given by scalars. We consider only one input variable \( x \) and one target value \( y \). We define an activation function \( \sigma_1 \) which takes as input</p>
<p>As yet another example we define now a simple perceptron model with
all quantities given by scalars. We consider only one input variable
\( x \) and one target value \( y \). We define an activation function
\( \sigma_1 \) which takes as input
</p>

<p>&nbsp;<br>
$$
z_1 = w_1x+b_1,
Expand Down Expand Up @@ -1125,7 +1140,7 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
<section>
<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
<p>We change our simple model to (see graph)
network with just one hidden layer but with scalar variables only.
</p>

Expand Down Expand Up @@ -1208,7 +1223,7 @@ <h2 id="the-training">The training </h2>

<p>&nbsp;<br>
$$
w_{i}\leftarrow = w_{i}- \eta \delta_i a_{i-1},
w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
$$
<p>&nbsp;<br>

Expand All @@ -1226,6 +1241,107 @@ <h2 id="the-training">The training </h2>
<p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>
</section>

<section>
<h2 id="code-examples-for-the-simple-models">Code examples for the simple models </h2>
</section>

<section>
<h2 id="simple-neural-network-and-the-back-propagation-equations">Simple neural network and the back propagation equations </h2>

<p>Let us now try to increase our level of ambitions and attempt to set
up the equations for a neural network with two input nodes, one hidden
layer with two hidden nodes and one utput layers.
</p>

<p>We need to define the following parameters and variables with the input layer (layer \( (0) \))
where we label the nodes \( x_0 \) and \( x_1 \)
</p>
<p>&nbsp;<br>
$$
x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
$$
<p>&nbsp;<br>

<p>The hidden layer (layer \( (1) \)) has nodes which yield the outputs \( a_0^{(1)} \) and \( a_1^{(1)} \)) with weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
<p>&nbsp;<br>
$$
w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
$$
<p>&nbsp;<br>
</section>

<section>
<h2 id="the-ouput-layer">The ouput layer </h2>

<p>Finally, we have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined </p>
<p>&nbsp;<br>
$$
w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
$$
<p>&nbsp;<br>

<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
The parameters we need to optimize are given by
</p>
<p>&nbsp;<br>
$$
\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
$$
<p>&nbsp;<br>
</section>

<section>
<h2 id="compact-expressions">Compact expressions </h2>

<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
The inputs to the first hidden layer are
</p>
<p>&nbsp;<br>
$$
\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
$$
<p>&nbsp;<br>

<p>with outputs</p>
<p>&nbsp;<br>
$$
\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
$$
<p>&nbsp;<br>
</section>

<section>
<h2 id="for-the-output-layer">For the output layer </h2>

<p>For the final output layer we have the inputs to the final activation function </p>
<p>&nbsp;<br>
$$
z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
$$
<p>&nbsp;<br>

<p>resulting in the output</p>
<p>&nbsp;<br>
$$
a^{(2)}=\sigma^{(2)}(z^{(2)}).
$$
<p>&nbsp;<br>
</section>

<section>
<h2 id="explicit-derivatives">Explicit derivatives </h2>

<p>In total we have nine parameters which we need to train.
Using the chain rule (or just the back propagation algorithm) we can find all derivatives. For the output layer we have then
</p>
</section>

<section>
<h2 id="setting-up-the-equations-for-the-optimization">Setting up the equations for the optimization </h2>

<p>For th</p>
</section>

<section>
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the back propagation equations for a neural network </h2>

Expand Down
Loading

0 comments on commit a0aa01c

Please sign in to comment.