Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
mhjensen committed Jan 30, 2024
1 parent 880245c commit d44582d
Show file tree
Hide file tree
Showing 8 changed files with 705 additions and 290 deletions.
65 changes: 59 additions & 6 deletions doc/pub/week3/html/week3-bs.html
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,13 @@
2,
None,
'optimizing-the-parameters'),
('First back propagation equation',
('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
('The derivatives', 2, None, 'the-derivatives'),
('Getting serious, the back propagation equations for a neural '
'network',
2,
None,
'first-back-propagation-equation'),
'getting-serious-the-back-propagation-equations-for-a-neural-network'),
('Analyzing the last results',
2,
None,
Expand Down Expand Up @@ -357,7 +360,9 @@
<!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;">Chain rule</a></li>
<!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;">First network example, simple percepetron with one input</a></li>
<!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;">Optimizing the parameters</a></li>
<!-- navigation toc: --> <li><a href="#first-back-propagation-equation" style="font-size: 80%;">First back propagation equation</a></li>
<!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;">Adding a hidden layer</a></li>
<!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;">The derivatives</a></li>
<!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;">Getting serious, the back propagation equations for a neural network</a></li>
<!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;">Analyzing the last results</a></li>
<!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;">More considerations</a></li>
<!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;">Derivatives in terms of \( z_j^L \)</a></li>
Expand Down Expand Up @@ -979,7 +984,7 @@ <h2 id="forward-and-reverse-modes" class="anchor">Forward and reverse modes </h2
since we start by evaluating the derivatives at the end point and then
propagate backwards. This is the standard way of evaluating
derivatives (gradients) when optimizing the parameters of a neural
network). In the context of deep learning this is computationally
network. In the context of deep learning this is computationally
more efficient since the output of a neural network consists of either
one or some few other output variables.
</p>
Expand Down Expand Up @@ -1149,7 +1154,7 @@ <h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2
define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
</p>

<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>

<!-- !split -->
<h2 id="chain-rule" class="anchor">Chain rule </h2>
Expand Down Expand Up @@ -1224,7 +1229,55 @@ <h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2


<!-- !split -->
<h2 id="first-back-propagation-equation" class="anchor">First back propagation equation </h2>
<h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
network with on hidden layer but still with scalar variables only.
</p>

<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
We have then
</p>
$$
z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
$$

$$
z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
$$

<p>and the cost function</p>
$$
C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
$$

<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>

<!-- !split -->
<h2 id="the-derivatives" class="anchor">The derivatives </h2>

<p>The derivatives are now, using the chain rule again</p>

$$
\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
$$

$$
\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
$$

$$
\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
$$

$$
\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
$$

<p>Can you generalize this to more than one hidden layer?</p>

<!-- !split -->
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network" class="anchor">Getting serious, the back propagation equations for a neural network </h2>

<p>We have thus</p>
$$
Expand Down
70 changes: 67 additions & 3 deletions doc/pub/week3/html/week3-reveal.html
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
since we start by evaluating the derivatives at the end point and then
propagate backwards. This is the standard way of evaluating
derivatives (gradients) when optimizing the parameters of a neural
network). In the context of deep learning this is computationally
network. In the context of deep learning this is computationally
more efficient since the output of a neural network consists of either
one or some few other output variables.
</p>
Expand Down Expand Up @@ -1028,7 +1028,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
</p>

<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
</section>

<section>
Expand Down Expand Up @@ -1121,7 +1121,71 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>
</section>

<section>
<h2 id="first-back-propagation-equation">First back propagation equation </h2>
<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
network with on hidden layer but still with scalar variables only.
</p>

<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
We have then
</p>
<p>&nbsp;<br>
$$
z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
$$
<p>&nbsp;<br>

<p>&nbsp;<br>
$$
z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
$$
<p>&nbsp;<br>

<p>and the cost function</p>
<p>&nbsp;<br>
$$
C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
$$
<p>&nbsp;<br>

<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
</section>

<section>
<h2 id="the-derivatives">The derivatives </h2>

<p>The derivatives are now, using the chain rule again</p>

<p>&nbsp;<br>
$$
\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
$$
<p>&nbsp;<br>

<p>&nbsp;<br>
$$
\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
$$
<p>&nbsp;<br>

<p>&nbsp;<br>
$$
\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
$$
<p>&nbsp;<br>

<p>&nbsp;<br>
$$
\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
$$
<p>&nbsp;<br>

<p>Can you generalize this to more than one hidden layer?</p>
</section>

<section>
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the back propagation equations for a neural network </h2>

<p>We have thus</p>
<p>&nbsp;<br>
Expand Down
61 changes: 56 additions & 5 deletions doc/pub/week3/html/week3-solarized.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,10 +193,13 @@
2,
None,
'optimizing-the-parameters'),
('First back propagation equation',
('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
('The derivatives', 2, None, 'the-derivatives'),
('Getting serious, the back propagation equations for a neural '
'network',
2,
None,
'first-back-propagation-equation'),
'getting-serious-the-back-propagation-equations-for-a-neural-network'),
('Analyzing the last results',
2,
None,
Expand Down Expand Up @@ -890,7 +893,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
since we start by evaluating the derivatives at the end point and then
propagate backwards. This is the standard way of evaluating
derivatives (gradients) when optimizing the parameters of a neural
network). In the context of deep learning this is computationally
network. In the context of deep learning this is computationally
more efficient since the output of a neural network consists of either
one or some few other output variables.
</p>
Expand Down Expand Up @@ -1060,7 +1063,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
</p>

<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="chain-rule">Chain rule </h2>
Expand Down Expand Up @@ -1135,7 +1138,55 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>


<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="first-back-propagation-equation">First back propagation equation </h2>
<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
network with on hidden layer but still with scalar variables only.
</p>

<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
We have then
</p>
$$
z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
$$

$$
z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
$$

<p>and the cost function</p>
$$
C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
$$

<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="the-derivatives">The derivatives </h2>

<p>The derivatives are now, using the chain rule again</p>

$$
\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
$$

$$
\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
$$

$$
\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
$$

$$
\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
$$

<p>Can you generalize this to more than one hidden layer?</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the back propagation equations for a neural network </h2>

<p>We have thus</p>
$$
Expand Down
61 changes: 56 additions & 5 deletions doc/pub/week3/html/week3.html
Original file line number Diff line number Diff line change
Expand Up @@ -270,10 +270,13 @@
2,
None,
'optimizing-the-parameters'),
('First back propagation equation',
('Adding a hidden layer', 2, None, 'adding-a-hidden-layer'),
('The derivatives', 2, None, 'the-derivatives'),
('Getting serious, the back propagation equations for a neural '
'network',
2,
None,
'first-back-propagation-equation'),
'getting-serious-the-back-propagation-equations-for-a-neural-network'),
('Analyzing the last results',
2,
None,
Expand Down Expand Up @@ -967,7 +970,7 @@ <h2 id="forward-and-reverse-modes">Forward and reverse modes </h2>
since we start by evaluating the derivatives at the end point and then
propagate backwards. This is the standard way of evaluating
derivatives (gradients) when optimizing the parameters of a neural
network). In the context of deep learning this is computationally
network. In the context of deep learning this is computationally
more efficient since the output of a neural network consists of either
one or some few other output variables.
</p>
Expand Down Expand Up @@ -1137,7 +1140,7 @@ <h2 id="automatic-differentiation">Automatic differentiation </h2>
define the elementary functions \( g_i(x_{Pa(x_i)}) \) where \( x_{Pa(x_i)} \) are the parent nodes of the variable \( x_i \).
</p>

<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \) that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>
<p>In our case, we have for example for \( x_3=g_3(x_{Pa(x_i)})=\exp{a} \), that \( g_3=\exp{()} \) and \( x_{Pa(x_3)}=a \).</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="chain-rule">Chain rule </h2>
Expand Down Expand Up @@ -1212,7 +1215,55 @@ <h2 id="optimizing-the-parameters">Optimizing the parameters </h2>


<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="first-back-propagation-equation">First back propagation equation </h2>
<h2 id="adding-a-hidden-layer">Adding a hidden layer </h2>

<p>We change our simple model to a (see graph on whiteboard notes)
network with on hidden layer but still with scalar variables only.
</p>

<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node.
We have then
</p>
$$
z_1 = w_1x+b_1 \hspace{0.1cm} a_1 = \sigma_1(z_1),
$$

$$
z_2 = w_wa_1+b_2 \hspace{0.1cm} a_2 = \sigma_2(z_2),
$$

<p>and the cost function</p>
$$
C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
$$

<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="the-derivatives">The derivatives </h2>

<p>The derivatives are now, using the chain rule again</p>

$$
\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
$$

$$
\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
$$

$$
\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'x,
$$

$$
\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'.
$$

<p>Can you generalize this to more than one hidden layer?</p>

<!-- !split --><br><br><br><br><br><br><br><br><br><br>
<h2 id="getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the back propagation equations for a neural network </h2>

<p>We have thus</p>
$$
Expand Down
Binary file modified doc/pub/week3/ipynb/ipynb-week3-src.tar.gz
Binary file not shown.
Loading

0 comments on commit d44582d

Please sign in to comment.