Skip to content

Commit

Permalink
more update
Browse files Browse the repository at this point in the history
  • Loading branch information
mhjensen committed Jan 15, 2024
1 parent ebbb7bc commit 7ef953b
Show file tree
Hide file tree
Showing 146 changed files with 987 additions and 33,535 deletions.
173 changes: 146 additions & 27 deletions doc/pub/week1/html/week1-bs.html
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,28 @@
2,
None,
'other-important-mathematical-operations'),
('Further mathematical notations',
2,
None,
'further-mathematical-notations'),
('Setting up the basic equations for neural networks',
2,
None,
'setting-up-the-basic-equations-for-neural-networks')]}
'setting-up-the-basic-equations-for-neural-networks'),
('Overarching view of a neural network',
2,
None,
'overarching-view-of-a-neural-network'),
('The optimzation problem', 2, None, 'the-optimzation-problem'),
('Other ingredients of a neural network',
2,
None,
'other-ingredients-of-a-neural-network'),
('Other parameters', 2, None, 'other-parameters'),
('Setting up the equations for a neural network',
2,
None,
'setting-up-the-equations-for-a-neural-network')]}
end of tocinfo -->

<body>
Expand Down Expand Up @@ -335,7 +353,13 @@
<!-- navigation toc: --> <li><a href="#vector-matrix-and-matrix-matrix-multiplication" style="font-size: 80%;">Vector-matrix and Matrix-matrix multiplication</a></li>
<!-- navigation toc: --> <li><a href="#important-mathematical-operations" style="font-size: 80%;">Important Mathematical Operations</a></li>
<!-- navigation toc: --> <li><a href="#other-important-mathematical-operations" style="font-size: 80%;">Other important mathematical operations</a></li>
<!-- navigation toc: --> <li><a href="#further-mathematical-notations" style="font-size: 80%;">Further mathematical notations</a></li>
<!-- navigation toc: --> <li><a href="#setting-up-the-basic-equations-for-neural-networks" style="font-size: 80%;">Setting up the basic equations for neural networks</a></li>
<!-- navigation toc: --> <li><a href="#overarching-view-of-a-neural-network" style="font-size: 80%;">Overarching view of a neural network</a></li>
<!-- navigation toc: --> <li><a href="#the-optimzation-problem" style="font-size: 80%;">The optimzation problem</a></li>
<!-- navigation toc: --> <li><a href="#other-ingredients-of-a-neural-network" style="font-size: 80%;">Other ingredients of a neural network</a></li>
<!-- navigation toc: --> <li><a href="#other-parameters" style="font-size: 80%;">Other parameters</a></li>
<!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;">Setting up the equations for a neural network</a></li>

</ul>
</li>
Expand Down Expand Up @@ -1058,21 +1082,20 @@ <h2 id="representing-the-wave-function" class="anchor">Representing the wave fun
</p>

$$
F_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})}.
P_{rbm}(\mathbf{x},\mathbf{h}) = \frac{1}{Z} e^{-\frac{1}{T_0}E(\mathbf{x},\mathbf{h})}.
$$

<p>To find the marginal distribution of \( \boldsymbol{x} \) we set:</p>

$$
F_{rbm}(\mathbf{x}) =\frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{x}, \mathbf{h})}.
P_{rbm}(\mathbf{x}) =\frac{1}{Z}\sum_\mathbf{h} e^{-E(\mathbf{x}, \mathbf{h})}.
$$

<p>Now this is what we use to represent the wave function, calling it a neural-network quantum state (NQS)</p>
$$
\Psi (\mathbf{X}) = F_{rbm}(\mathbf{x}),
\vert\Psi (\mathbf{X})\vert^2 = P_{rbm}(\mathbf{x}).
$$

<p>or we could square the wave function.</p>

<!-- !split -->
<h2 id="define-the-cost-function" class="anchor">Define the cost function </h2>
Expand Down Expand Up @@ -1232,20 +1255,21 @@ <h2 id="basic-matrix-features" class="anchor">Basic Matrix Features </h2>

<p>or in terms of its column vectors \( \boldsymbol{a}_i \) as</p>
$$
\mathbf{A} =
\boldsymbol{A} =
\begin{bmatrix}\boldsymbol{a}_{0} & \boldsymbol{a}_{1} & \boldsymbol{a}_{2} & \dots & \dots & \boldsymbol{a}_{n-2} & \boldsymbol{a}_{n-1}\end{bmatrix}.
$$

<p>We can think of a matrix as a diagram of in general \( n \) rowns and \( m \) columns. In the example here we have a square matrix.</p>

<!-- !split -->
<h2 id="the-inverse-of-a-matrix" class="anchor">The inverse of a matrix </h2>
<div class="panel panel-default">
<div class="panel-body">
<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
<p>The inverse of a matrix (if it exists) is defined by</p>
<p>The inverse of a square matrix (if it exists) is defined by</p>

$$
\mathbf{A}^{-1} \cdot \mathbf{A} = I,
\boldsymbol{A}^{-1} \cdot \boldsymbol{A} = I,
$$

<p>where \( \boldsymbol{I} \) is the unit matrix.</p>
Expand Down Expand Up @@ -1300,15 +1324,15 @@ <h2 id="matrix-features" class="anchor">Matrix Features </h2>
<div class="panel panel-default">
<div class="panel-body">
<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
<p>For an \( N\times N \) matrix \( \mathbf{A} \) the following properties are all equivalent</p>
<p>For an \( n\times n \) matrix \( \boldsymbol{A} \) the following properties are all equivalent</p>

<ul>
<li> If the inverse of \( \mathbf{A} \) exists, \( \mathbf{A} \) is nonsingular.</li>
<li> The equation \( \mathbf{Ax}=0 \) implies \( \mathbf{x}=0 \).</li>
<li> The rows of \( \mathbf{A} \) form a basis of \( R^N \).</li>
<li> The columns of \( \mathbf{A} \) form a basis of \( R^N \).</li>
<li> \( \mathbf{A} \) is a product of elementary matrices.</li>
<li> \( 0 \) is not eigenvalue of \( \mathbf{A} \).</li>
<li> If the inverse of \( \boldsymbol{A} \) exists, \( \boldsymbol{A} \) is nonsingular.</li>
<li> The equation \( \boldsymbol{Ax}=0 \) implies \( \boldsymbol{x}=0 \).</li>
<li> The rows of \( \boldsymbol{A} \) form a basis of \( R^N \).</li>
<li> The columns of \( \boldsymbol{A} \) form a basis of \( R^N \).</li>
<li> \( \boldsymbol{A} \) is a product of elementary matrices.</li>
<li> \( 0 \) is not an eigenvalue of \( \boldsymbol{A} \).</li>
</ul>
</div>
</div>
Expand All @@ -1320,13 +1344,13 @@ <h2 id="important-mathematical-operations" class="anchor">Important Mathematical
<p>The basic matrix operations that we will deal with are addition and subtraction</p>

$$
\mathbf{A}= \mathbf{B}\pm\mathbf{C} \Longrightarrow a_{ij} = b_{ij}\pm c_{ij},
\boldsymbol{A}= \boldsymbol{B}\pm\boldsymbol{C} \Longrightarrow a_{ij} = b_{ij}\pm c_{ij},
$$

<p>and scalar-matrix multiplication</p>

$$
\mathbf{A}= \gamma\mathbf{B} \Longrightarrow a_{ij} = \gamma b_{ij}.
\boldsymbol{A}= \gamma\boldsymbol{B} \Longrightarrow a_{ij} = \gamma b_{ij}.
$$


Expand All @@ -1335,19 +1359,19 @@ <h2 id="vector-matrix-and-matrix-matrix-multiplication" class="anchor">Vector-ma

<p>We have also vector-matrix multiplications </p>
$$
\mathbf{y}=\mathbf{Ax} \Longrightarrow y_{i} = \sum_{j=1}^{n} a_{ij}x_j,
\boldsymbol{y}=\boldsymbol{Ax} \Longrightarrow y_{i} = \sum_{j=1}^{n} a_{ij}x_j,
$$

<p>and matrix-matrix multiplications</p>

$$
\mathbf{A}=\mathbf{BC} \Longrightarrow a_{ij} = \sum_{k=1}^{n} b_{ik}c_{kj},
\boldsymbol{A}=\boldsymbol{BC} \Longrightarrow a_{ij} = \sum_{k=1}^{n} b_{ik}c_{kj},
$$

<p>and transpositions of a matrix</p>

$$
\mathbf{A}=\mathbf{B}^T \Longrightarrow a_{ij} = b_{ji}.
\boldsymbol{A}=\boldsymbol{B}^T \Longrightarrow a_{ij} = b_{ji}.
$$


Expand All @@ -1357,46 +1381,141 @@ <h2 id="important-mathematical-operations" class="anchor">Important Mathematical
<p>Similarly, important vector operations that we will deal with are addition and subtraction</p>

$$
\mathbf{x}= \mathbf{y}\pm\mathbf{z} \Longrightarrow x_{i} = y_{i}\pm z_{i},
\boldsymbol{x}= \boldsymbol{y}\pm\boldsymbol{z} \Longrightarrow x_{i} = y_{i}\pm z_{i},
$$

<p>scalar-vector multiplication</p>

$$
\mathbf{x}= \gamma\mathbf{y} \Longrightarrow x_{i} = \gamma y_{i},
\boldsymbol{x}= \gamma\boldsymbol{y} \Longrightarrow x_{i} = \gamma y_{i},
$$


<!-- !split -->
<h2 id="other-important-mathematical-operations" class="anchor">Other important mathematical operations </h2>
<p>and vector-vector multiplication (called Hadamard multiplication)</p>
$$
\mathbf{x}=\mathbf{yz} \Longrightarrow x_{i} = y_{i}z_i.
\boldsymbol{x}=\boldsymbol{yz} \Longrightarrow x_{i} = y_{i}z_i.
$$

<p>Finally, as already metnioned, the inner or so-called dot product resulting in a constant</p>

$$
x=\mathbf{y}^T\mathbf{z} \Longrightarrow x = \sum_{j=1}^{n} y_{j}z_{j},
x=\boldsymbol{y}^T\boldsymbol{z} \Longrightarrow x = \sum_{j=1}^{n} y_{j}z_{j},
$$

<p>and the outer product, which yields a matrix,</p>

$$
\mathbf{A}= \mathbf{yz}^T \Longrightarrow a_{ij} = y_{i}z_{j},
\boldsymbol{A}= \boldsymbol{y}\boldsymbol{z}^T \Longrightarrow a_{ij} = y_{i}z_{j},
$$


<!-- !split -->
<h2 id="further-mathematical-notations" class="anchor">Further mathematical notations </h2>
<ol>
<li> For all/any \( \forall \)</li>
<li> Implies \( \implies \)</li>
<li> Equivalent \( \equiv \)</li>
<li> Real variable \( \mathbb{R} \)</li>
<li> Integer variable \( \mathbb{I} \)</li>
<li> Complex variable \( \mathbb{C} \)</li>
</ol>
<!-- !split -->
<h2 id="setting-up-the-basic-equations-for-neural-networks" class="anchor">Setting up the basic equations for neural networks </h2>

<p>Neural networks, its so-called feed-forward form, where each
<p>Neural networks, in its so-called feed-forward form, where each
iterations contains a feed-forward stage and a back-propgagation
stage, consist of series of affine matrix-matrix and matrix-vector
multiplications. The unknown parameters (the so-called biases and
weights which deternine the architecture of a neural network), are uptaded iteratively
weights which deternine the architecture of a neural network), are
uptaded iteratively using the so-called back-propagation algorithm.
This algorithm corresponds to the so-called reverse mode of the
automatic differentation algorithm. These algorithms will be discussed
in more detail below.
</p>

<p>We start however first with the definitions of the various variables which make up a neural network.</p>

<!-- !split -->
<h2 id="overarching-view-of-a-neural-network" class="anchor">Overarching view of a neural network </h2>

<p>The architecture of a neural network defines our model. This model
aims at describing some function \( f(\boldsymbol{x} \) which aims at describing
some final result (outputs or tagrget values) given a specific inpput
\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
vectors.
</p>

<p>The architecture consists of</p>
<ol>
<li> An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)</li>
<li> A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</li>
<li> A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.</li>
<li> The last layer, normally called <b>output</b> layer has normally an activation function tailored to the specific problem</li>
<li> Finally we define a so-called cost or loss function which is used to gauge the quality of our model.</li>
</ol>
<!-- !split -->
<h2 id="the-optimzation-problem" class="anchor">The optimzation problem </h2>

<p>The cost function is a function of the unknown parameters
\( \boldsymbol{\Theta} \) where the latter is a container for all possible
parameters needed to define a neural network
</p>

<p>If we are dealing with a regression task a typical cost/loss function
is the mean squared error
</p>
$$
C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
$$

<p>This function represents one of many possible ways to define
the so-called cost function.
</p>

<p>For neural networks the parameters
\( \boldsymbol{\Theta} \) are given by the so-called weights and biases (to be
defined below).
</p>

<p>The weights are given by matrix elements \( w_{ij}^{(l)} \) where the
superscript indicates the layer number. The biases are typically given
by vector elements representing each single node of a given layer,
that is \( b_j^{(l)} \).
</p>

<!-- !split -->
<h2 id="other-ingredients-of-a-neural-network" class="anchor">Other ingredients of a neural network </h2>

<p>Having defined the architecture of a neural network, the optimization
of the cost function with respect to the parameters \( \boldsymbol{\Theta} \),
involves the calculations of gradients and their optimization. The
gradients represent the derivatives of a multidimensional object and
are often approximated by various gradient methods, including
</p>
<ol>
<li> various quasi-Newton methods,</li>
<li> plain gradient descent (GD) with a constant learning rate \( \eta \),</li>
<li> GD with momentum and other approximations to the learning rates such as</li>
<ul>
<li> Adapative gradient (ADAgrad)</li>
<li> Root mean-square propagation (RMSprop)</li>
<li> Adaptive gradient with momentum (ADAM) and many other</li>
</ul>
<li> Stochastic gradient descent and various families of learning rate approximations</li>
</ol>
<!-- !split -->
<h2 id="other-parameters" class="anchor">Other parameters </h2>

<p>In addition to the above, there are often additional hyperparamaters
which are included in the setup of a neural network. These will be
discussed below.
</p>

<!-- !split -->
<h2 id="setting-up-the-equations-for-a-neural-network" class="anchor">Setting up the equations for a neural network </h2>

<!-- ------------------- end of main content --------------- -->
</div> <!-- end container -->
<!-- include javascript, jQuery *first* -->
Expand Down
Loading

0 comments on commit 7ef953b

Please sign in to comment.