more update

CompPhysics · Jan 15, 2024 · 7ef953b · 7ef953b
1 parent ebbb7bc
commit 7ef953b
Show file tree

Hide file tree

Showing 146 changed files with 987 additions and 33,535 deletions.
diff --git a/doc/pub/week1/html/week1-bs.html b/doc/pub/week1/html/week1-bs.html
@@ -236,10 +236,28 @@
                2,
                None,
                'other-important-mathematical-operations'),
+              ('Further mathematical notations',
+               2,
+               None,
+               'further-mathematical-notations'),
               ('Setting up the basic equations for neural networks',
                2,
                None,
-               'setting-up-the-basic-equations-for-neural-networks')]}
+               'setting-up-the-basic-equations-for-neural-networks'),
+              ('Overarching view of a neural network',
+               2,
+               None,
+               'overarching-view-of-a-neural-network'),
+              ('The optimzation problem', 2, None, 'the-optimzation-problem'),
+              ('Other ingredients of a neural network',
+               2,
+               None,
+               'other-ingredients-of-a-neural-network'),
+              ('Other parameters', 2, None, 'other-parameters'),
+              ('Setting up the equations for a neural network',
+               2,
+               None,
+               'setting-up-the-equations-for-a-neural-network')]}
 end of tocinfo -->
 
 <body>
@@ -335,7 +353,13 @@
      <!-- navigation toc: --> <li><a href="#vector-matrix-and-matrix-matrix-multiplication" style="font-size: 80%;">Vector-matrix and Matrix-matrix multiplication</a></li>
      <!-- navigation toc: --> <li><a href="#important-mathematical-operations" style="font-size: 80%;">Important Mathematical Operations</a></li>
      <!-- navigation toc: --> <li><a href="#other-important-mathematical-operations" style="font-size: 80%;">Other important mathematical operations</a></li>
+     <!-- navigation toc: --> <li><a href="#further-mathematical-notations" style="font-size: 80%;">Further mathematical notations</a></li>
      <!-- navigation toc: --> <li><a href="#setting-up-the-basic-equations-for-neural-networks" style="font-size: 80%;">Setting up the basic equations for neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#overarching-view-of-a-neural-network" style="font-size: 80%;">Overarching view of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#the-optimzation-problem" style="font-size: 80%;">The optimzation problem</a></li>
+     <!-- navigation toc: --> <li><a href="#other-ingredients-of-a-neural-network" style="font-size: 80%;">Other ingredients of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#other-parameters" style="font-size: 80%;">Other parameters</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;">Setting up the equations for a neural network</a></li>
 
         </ul>
       </li>
@@ -1058,21 +1082,20 @@ <h2 id="representing-the-wave-function" class="anchor">Representing the wave fun
 </p>
 
 $$
-        F_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})}.
+        P_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})}.
 $$
 
 <p>To find the marginal distribution of \( \boldsymbol{x} \) we set:</p>
 
 $$
-        F_{rbm}(\mathbf{x}) =\frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{x}, \mathbf{h})}.
+        P_{rbm}(\mathbf{x}) =\frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{x}, \mathbf{h})}.
 $$
 
 <p>Now this is what we use to represent the wave function, calling it a neural-network quantum state (NQS)</p>
 $$
-        \Psi (\mathbf{X}) = F_{rbm}(\mathbf{x}),
+        \vert\Psi (\mathbf{X})\vert^2 = P_{rbm}(\mathbf{x}).
 $$
 
-<p>or we could square the wave function.</p>
 
 <!-- !split -->
 <h2 id="define-the-cost-function" class="anchor">Define the cost function </h2>
@@ -1232,20 +1255,21 @@ <h2 id="basic-matrix-features" class="anchor">Basic Matrix Features </h2>
 
 <p>or in terms of its column vectors \( \boldsymbol{a}_i \) as</p>
 $$
- \mathbf{A} =
+ \boldsymbol{A} =
 \begin{bmatrix}\boldsymbol{a}_{0} & \boldsymbol{a}_{1} & \boldsymbol{a}_{2} & \dots & \dots & \boldsymbol{a}_{n-2} & \boldsymbol{a}_{n-1}\end{bmatrix}.	       
 $$
 
+<p>We can think of a matrix as a diagram of in general \( n \) rowns and \( m \) columns. In the example here we have a square matrix.</p>
 
 <!-- !split -->
 <h2 id="the-inverse-of-a-matrix" class="anchor">The inverse of a matrix  </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>The inverse of a matrix (if it exists) is defined by</p>
+<p>The inverse of a square matrix (if it exists) is defined by</p>
 
 $$
-\mathbf{A}^{-1} \cdot \mathbf{A} = I,
+\boldsymbol{A}^{-1} \cdot \boldsymbol{A} = I,
 $$
 
 <p>where \( \boldsymbol{I} \) is the unit matrix.</p>
@@ -1300,15 +1324,15 @@ <h2 id="matrix-features" class="anchor">Matrix Features </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>For an \( N\times N \) matrix  \( \mathbf{A} \) the following properties are all equivalent</p>
+<p>For an \( n\times n \) matrix  \( \boldsymbol{A} \) the following properties are all equivalent</p>
 
 <ul>
-  <li> If the inverse of \( \mathbf{A} \) exists, \( \mathbf{A} \) is nonsingular.</li>
-  <li> The equation \( \mathbf{Ax}=0 \) implies \( \mathbf{x}=0 \).</li>
-  <li> The rows of \( \mathbf{A} \) form a basis of \( R^N \).</li>
-  <li> The columns of \( \mathbf{A} \) form a basis of \( R^N \).</li>
-  <li> \( \mathbf{A} \) is a product of elementary matrices.</li>
-  <li> \( 0 \) is not eigenvalue of \( \mathbf{A} \).</li>
+  <li> If the inverse of \( \boldsymbol{A} \) exists, \( \boldsymbol{A} \) is nonsingular.</li>
+  <li> The equation \( \boldsymbol{Ax}=0 \) implies \( \boldsymbol{x}=0 \).</li>
+  <li> The rows of \( \boldsymbol{A} \) form a basis of \( R^N \).</li>
+  <li> The columns of \( \boldsymbol{A} \) form a basis of \( R^N \).</li>
+  <li> \( \boldsymbol{A} \) is a product of elementary matrices.</li>
+  <li> \( 0 \) is not an eigenvalue of \( \boldsymbol{A} \).</li>
 </ul>
 </div>
 </div>
@@ -1320,13 +1344,13 @@ <h2 id="important-mathematical-operations" class="anchor">Important Mathematical
 <p>The basic matrix operations that we will deal with are addition and subtraction</p>
 
 $$
-\mathbf{A}= \mathbf{B}\pm\mathbf{C}  \Longrightarrow a_{ij} = b_{ij}\pm c_{ij},
+\boldsymbol{A}= \boldsymbol{B}\pm\boldsymbol{C}  \Longrightarrow a_{ij} = b_{ij}\pm c_{ij},
 $$
 
 <p>and scalar-matrix multiplication</p>
 
 $$
-\mathbf{A}= \gamma\mathbf{B}  \Longrightarrow a_{ij} = \gamma b_{ij}.
+\boldsymbol{A}= \gamma\boldsymbol{B}  \Longrightarrow a_{ij} = \gamma b_{ij}.
 $$
 
 
@@ -1335,19 +1359,19 @@ <h2 id="vector-matrix-and-matrix-matrix-multiplication" class="anchor">Vector-ma
 
 <p>We have also vector-matrix multiplications </p>
 $$
-\mathbf{y}=\mathbf{Ax}   \Longrightarrow y_{i} = \sum_{j=1}^{n} a_{ij}x_j,
+\boldsymbol{y}=\boldsymbol{Ax}   \Longrightarrow y_{i} = \sum_{j=1}^{n} a_{ij}x_j,
 $$
 
 <p>and matrix-matrix multiplications</p>
 
 $$
-\mathbf{A}=\mathbf{BC}   \Longrightarrow a_{ij} = \sum_{k=1}^{n} b_{ik}c_{kj},
+\boldsymbol{A}=\boldsymbol{BC}   \Longrightarrow a_{ij} = \sum_{k=1}^{n} b_{ik}c_{kj},
 $$
 
 <p>and transpositions of a matrix</p>
 
 $$
-\mathbf{A}=\mathbf{B}^T   \Longrightarrow a_{ij} = b_{ji}.
+\boldsymbol{A}=\boldsymbol{B}^T   \Longrightarrow a_{ij} = b_{ji}.
 $$
 
 
@@ -1357,46 +1381,141 @@ <h2 id="important-mathematical-operations" class="anchor">Important Mathematical
 <p>Similarly, important vector operations that we will deal with are addition and subtraction</p>
 
 $$
-\mathbf{x}= \mathbf{y}\pm\mathbf{z}  \Longrightarrow x_{i} = y_{i}\pm z_{i},
+\boldsymbol{x}= \boldsymbol{y}\pm\boldsymbol{z}  \Longrightarrow x_{i} = y_{i}\pm z_{i},
 $$
 
 <p>scalar-vector multiplication</p>
 
 $$
-\mathbf{x}= \gamma\mathbf{y}  \Longrightarrow x_{i} = \gamma y_{i},
+\boldsymbol{x}= \gamma\boldsymbol{y}  \Longrightarrow x_{i} = \gamma y_{i},
 $$
 
 
 <!-- !split -->
 <h2 id="other-important-mathematical-operations" class="anchor">Other important mathematical operations </h2>
 <p>and vector-vector multiplication (called Hadamard multiplication)</p>
 $$
-\mathbf{x}=\mathbf{yz}   \Longrightarrow x_{i} = y_{i}z_i.
+\boldsymbol{x}=\boldsymbol{yz}   \Longrightarrow x_{i} = y_{i}z_i.
 $$
 
 <p>Finally, as already metnioned, the inner or so-called dot product  resulting in a constant</p>
 
 $$
-x=\mathbf{y}^T\mathbf{z}   \Longrightarrow x = \sum_{j=1}^{n} y_{j}z_{j},
+x=\boldsymbol{y}^T\boldsymbol{z}   \Longrightarrow x = \sum_{j=1}^{n} y_{j}z_{j},
 $$
 
 <p>and the outer product, which yields a matrix,</p>
 
 $$
-\mathbf{A}=  \mathbf{yz}^T \Longrightarrow  a_{ij} = y_{i}z_{j},
+\boldsymbol{A}=  \boldsymbol{y}\boldsymbol{z}^T \Longrightarrow  a_{ij} = y_{i}z_{j},
 $$
 
 
+<!-- !split -->
+<h2 id="further-mathematical-notations" class="anchor">Further mathematical notations </h2>
+<ol>
+<li> For all/any  \( \forall \)</li>
+<li> Implies \( \implies \)</li>
+<li> Equivalent \( \equiv \)</li>
+<li> Real variable \( \mathbb{R} \)</li>
+<li> Integer variable \( \mathbb{I} \)</li>
+<li> Complex  variable \( \mathbb{C} \)</li>
+</ol>
 <!-- !split -->
 <h2 id="setting-up-the-basic-equations-for-neural-networks" class="anchor">Setting up the basic equations for neural networks </h2>
 
-<p>Neural networks, its so-called feed-forward form, where each
+<p>Neural networks, in its so-called feed-forward form, where each
 iterations contains a feed-forward stage and a back-propgagation
 stage, consist of series of affine matrix-matrix and matrix-vector
 multiplications. The unknown parameters (the so-called biases and
-weights which deternine the architecture of a neural network), are uptaded iteratively 
+weights which deternine the architecture of a neural network), are
+uptaded iteratively using the so-called back-propagation algorithm.
+This algorithm corresponds to the so-called reverse mode of the
+automatic differentation algorithm. These algorithms will be discussed
+in more detail below.
 </p>
 
+<p>We start however first with the  definitions of the various variables which make up a neural network.</p>
+
+<!-- !split -->
+<h2 id="overarching-view-of-a-neural-network" class="anchor">Overarching view of a neural network </h2>
+
+<p>The architecture of a neural network defines our model. This model
+aims at describing some function \( f(\boldsymbol{x} \) which aims at describing
+some final result (outputs or tagrget values) given a specific inpput
+\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
+vectors.
+</p>
+
+<p>The architecture consists of</p>
+<ol>
+<li> An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)</li>
+<li> A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</li>
+<li> A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.</li>
+<li> The last layer, normally called <b>output</b> layer has normally an activation function tailored to the specific problem</li>
+<li> Finally we define a so-called cost or loss function which is used to gauge the quality of our model.</li> 
+</ol>
+<!-- !split -->
+<h2 id="the-optimzation-problem" class="anchor">The optimzation problem </h2>
+
+<p>The cost function is a function of the unknown parameters
+\( \boldsymbol{\Theta} \) where the latter is a container for all possible
+parameters needed to define a neural network
+</p>
+
+<p>If we are dealing with a regression task a typical cost/loss function
+is the mean squared error
+</p>
+$$
+C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
+$$
+
+<p>This function represents one of many possible ways to define
+the so-called cost function.
+</p>
+
+<p>For neural networks the parameters
+\( \boldsymbol{\Theta} \) are given by the so-called weights and biases (to be
+defined below).
+</p>
+
+<p>The weights are given by matrix elements \( w_{ij}^{(l)} \) where the
+superscript indicates the layer number. The biases are typically given
+by vector elements representing each single node of a given layer,
+that is \( b_j^{(l)} \).
+</p>
+
+<!-- !split -->
+<h2 id="other-ingredients-of-a-neural-network" class="anchor">Other ingredients of a neural network </h2>
+
+<p>Having defined the architecture of a neural network, the optimization
+of the cost function with respect to the parameters \( \boldsymbol{\Theta} \),
+involves the calculations of gradients and their optimization. The
+gradients represent the derivatives of a multidimensional object and
+are often approximated by various gradient methods, including
+</p>
+<ol>
+<li> various quasi-Newton methods,</li>
+<li> plain gradient descent (GD) with a constant learning rate \( \eta \),</li>
+<li> GD with momentum and other approximations to the learning rates such as</li>
+<ul>
+  <li> Adapative gradient (ADAgrad)</li>
+  <li> Root mean-square propagation (RMSprop)</li>
+  <li> Adaptive gradient with momentum (ADAM) and many other</li>
+</ul>
+<li> Stochastic gradient descent and various families of learning rate approximations</li>
+</ol>
+<!-- !split -->
+<h2 id="other-parameters" class="anchor">Other parameters </h2>
+
+<p>In addition to the above, there are often additional hyperparamaters
+which are included in the setup of a neural network. These will be
+discussed below.
+</p>
+
+<!-- !split -->
+<h2 id="setting-up-the-equations-for-a-neural-network" class="anchor">Setting up the equations for a neural network </h2>
+
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
 <!-- include javascript, jQuery *first* -->