update

CompPhysics · Mar 5, 2024 · 01d893a · 01d893a
1 parent 36ddf93
commit 01d893a
Show file tree

Hide file tree

Showing 11 changed files with 1,025 additions and 750 deletions.
diff --git a/doc/pub/week8/html/week8-bs.html b/doc/pub/week8/html/week8-bs.html
@@ -137,9 +137,16 @@
                None,
                'gating-mechanism-long-short-term-memory-lstm'),
               ('Implementing a memory cell in a neural network',
-               3,
+               2,
                None,
                'implementing-a-memory-cell-in-a-neural-network'),
+              ('LSTM details', 2, None, 'lstm-details'),
+              ('Basic layout', 2, None, 'basic-layout'),
+              ('More LSTM details', 2, None, 'more-lstm-details'),
+              ('The forget gate', 2, None, 'the-forget-gate'),
+              ('Input gate', 2, None, 'input-gate'),
+              ('Forget and input', 2, None, 'forget-and-input'),
+              ('Output gate', 2, None, 'output-gate'),
               ('An extrapolation example', 2, None, 'an-extrapolation-example'),
               ('Formatting the Data', 2, None, 'formatting-the-data'),
               ('Predicting New Points With A Trained Recurrent Neural Network',
@@ -292,7 +299,14 @@
      <!-- navigation toc: --> <li><a href="#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
      <!-- navigation toc: --> <li><a href="#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
      <!-- navigation toc: --> <li><a href="#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Implementing a memory cell in a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
+     <!-- navigation toc: --> <li><a href="#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
+     <!-- navigation toc: --> <li><a href="#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
      <!-- navigation toc: --> <li><a href="#an-extrapolation-example" style="font-size: 80%;"><b>An extrapolation example</b></a></li>
      <!-- navigation toc: --> <li><a href="#formatting-the-data" style="font-size: 80%;"><b>Formatting the Data</b></a></li>
      <!-- navigation toc: --> <li><a href="#predicting-new-points-with-a-trained-recurrent-neural-network" style="font-size: 80%;"><b>Predicting New Points With A Trained Recurrent Neural Network</b></a></li>
@@ -804,7 +818,8 @@ <h2 id="gating-mechanism-long-short-term-memory-lstm" class="anchor">Gating mech
 <li> The information stays in the cell so long as its <b>keep</b> gate is on.</li>
 <li> Information can be read from the cell by turning on its <b>read</b> gate.</li> 
 </ol>
-<h3 id="implementing-a-memory-cell-in-a-neural-network" class="anchor">Implementing a memory cell in a neural network </h3>
+<!-- !split -->
+<h2 id="implementing-a-memory-cell-in-a-neural-network" class="anchor">Implementing a memory cell in a neural network </h2>
 
 <p>To preserve information for a long time in
 the activities of an RNN, we use a circuit
@@ -817,6 +832,112 @@ <h3 id="implementing-a-memory-cell-in-a-neural-network" class="anchor">Implement
 <li> Information is retrieved by activating the read gate.</li>
 <li> We can backpropagate through this circuit because logistics are have nice derivatives.</li> 
 </ol>
+<!-- !split -->
+<h2 id="lstm-details" class="anchor">LSTM details </h2>
+
+<p>The LSTM is a unit cell that is made of three gates:</p>
+<ol>
+<li> the input gate,</li>
+<li> the forget gate,</li>
+<li> and the output gate.</li>
+</ol>
+<p>It also introduces a cell state \( c \), which can be thought of as the
+long-term memory, and a hidden state \( h \) which can be thought of as
+the short-term memory.
+</p>
+
+<!-- !split -->
+<h2 id="basic-layout" class="anchor">Basic layout </h2>
+
+<br/><br/>
+<center>
+<p><img src="figslides/lstm.png" width="700" align="bottom"></p>
+</center>
+<br/><br/>
+
+<!-- !split -->
+<h2 id="more-lstm-details" class="anchor">More LSTM details </h2>
+
+<p>The first stage is called the forget gate, where we combine the input
+at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing
+it through the Sigmoid activation function and then performing an
+element-wise multiplication, denoted by \( \otimes \).
+</p>
+
+<p>It follows </p>
+$$
+\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
+$$
+
+<p>where \( W \) and \( U \) are the weights respectively.</p>
+
+<!-- !split -->
+<h2 id="the-forget-gate" class="anchor">The forget gate </h2>
+
+<p>This is called the forget gate since the Sigmoid activation function's
+outputs are very close to \( 0 \) if the argument for the function is very
+negative, and \( 1 \) if the argument is very positive. Hence we can
+control the amount of information we want to take from the long-term
+memory.
+</p>
+
+<!-- !split -->
+<h2 id="input-gate" class="anchor">Input gate </h2>
+
+<p>The next stage is the input gate, which consists of both a Sigmoid
+function (\( \sigma_i \)), which decide what percentage of the input will
+be stored in the long-term memory, and the \( \tanh_i \) function, which
+decide what is the full memory that can be stored in the long term
+memory. When these results are calculated and multiplied together, it
+is added to the cell state or stored in the long-term memory, denoted
+as \( \oplus \). 
+</p>
+
+<p>We have</p>
+$$
+\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
+$$
+
+<p>and</p>
+$$
+\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
+$$
+
+<p>again the \( W \) and \( U \) are the weights.</p>
+
+<!-- !split -->
+<h2 id="forget-and-input" class="anchor">Forget and input </h2>
+
+<p>The forget gate and the input gate together also update the cell state with the following equation, </p>
+$$
+\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
+$$
+
+<p>where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.</p>
+
+<!-- !split -->
+<h2 id="output-gate" class="anchor">Output gate </h2>
+
+<p>The final stage of the LSTM is the output gate, and its purpose is to
+update the short-term memory.  To achieve this, we take the newly
+generated long-term memory and process it through a hyperbolic tangent
+(\( \tanh \)) function creating a potential new short-term memory. We then
+multiply this potential memory by the output of the Sigmoid function
+(\( \sigma_o \)). This multiplication generates the final output as well
+as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within
+the LSTM cell.
+</p>
+
+<p>We have </p>
+$$
+\begin{aligned}
+\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
+\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
+\end{aligned}
+$$
+
+<p>where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</p>
+
 <!-- !split -->
 <h2 id="an-extrapolation-example" class="anchor">An extrapolation example </h2>
 

diff --git a/doc/pub/week8/html/week8-reveal.html b/doc/pub/week8/html/week8-reveal.html
@@ -686,8 +686,10 @@ <h2 id="gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Sho
 <p><li> The information stays in the cell so long as its <b>keep</b> gate is on.</li>
 <p><li> Information can be read from the cell by turning on its <b>read</b> gate.</li> 
 </ol>
-<p>
-<h3 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network </h3>
+</section>
+
+<section>
+<h2 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network </h2>
 
 <p>To preserve information for a long time in
 the activities of an RNN, we use a circuit
@@ -702,6 +704,130 @@ <h3 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory ce
 </ol>
 </section>
 
+<section>
+<h2 id="lstm-details">LSTM details </h2>
+
+<p>The LSTM is a unit cell that is made of three gates:</p>
+<ol>
+<p><li> the input gate,</li>
+<p><li> the forget gate,</li>
+<p><li> and the output gate.</li>
+</ol>
+<p>
+<p>It also introduces a cell state \( c \), which can be thought of as the
+long-term memory, and a hidden state \( h \) which can be thought of as
+the short-term memory.
+</p>
+</section>
+
+<section>
+<h2 id="basic-layout">Basic layout </h2>
+
+<br/><br/>
+<center>
+<p><img src="figslides/lstm.png" width="700" align="bottom"></p>
+</center>
+<br/><br/>
+</section>
+
+<section>
+<h2 id="more-lstm-details">More LSTM details </h2>
+
+<p>The first stage is called the forget gate, where we combine the input
+at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing
+it through the Sigmoid activation function and then performing an
+element-wise multiplication, denoted by \( \otimes \).
+</p>
+
+<p>It follows </p>
+<p>&nbsp;<br>
+$$
+\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
+$$
+<p>&nbsp;<br>
+
+<p>where \( W \) and \( U \) are the weights respectively.</p>
+</section>
+
+<section>
+<h2 id="the-forget-gate">The forget gate </h2>
+
+<p>This is called the forget gate since the Sigmoid activation function's
+outputs are very close to \( 0 \) if the argument for the function is very
+negative, and \( 1 \) if the argument is very positive. Hence we can
+control the amount of information we want to take from the long-term
+memory.
+</p>
+</section>
+
+<section>
+<h2 id="input-gate">Input gate </h2>
+
+<p>The next stage is the input gate, which consists of both a Sigmoid
+function (\( \sigma_i \)), which decide what percentage of the input will
+be stored in the long-term memory, and the \( \tanh_i \) function, which
+decide what is the full memory that can be stored in the long term
+memory. When these results are calculated and multiplied together, it
+is added to the cell state or stored in the long-term memory, denoted
+as \( \oplus \). 
+</p>
+
+<p>We have</p>
+<p>&nbsp;<br>
+$$
+\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
+$$
+<p>&nbsp;<br>
+
+<p>and</p>
+<p>&nbsp;<br>
+$$
+\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
+$$
+<p>&nbsp;<br>
+
+<p>again the \( W \) and \( U \) are the weights.</p>
+</section>
+
+<section>
+<h2 id="forget-and-input">Forget and input </h2>
+
+<p>The forget gate and the input gate together also update the cell state with the following equation, </p>
+<p>&nbsp;<br>
+$$
+\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
+$$
+<p>&nbsp;<br>
+
+<p>where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.</p>
+</section>
+
+<section>
+<h2 id="output-gate">Output gate </h2>
+
+<p>The final stage of the LSTM is the output gate, and its purpose is to
+update the short-term memory.  To achieve this, we take the newly
+generated long-term memory and process it through a hyperbolic tangent
+(\( \tanh \)) function creating a potential new short-term memory. We then
+multiply this potential memory by the output of the Sigmoid function
+(\( \sigma_o \)). This multiplication generates the final output as well
+as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within
+the LSTM cell.
+</p>
+
+<p>We have </p>
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
+\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
+\end{aligned}
+$$
+<p>&nbsp;<br>
+
+<p>where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</p>
+</section>
+
 <section>
 <h2 id="an-extrapolation-example">An extrapolation example </h2>