diff --git a/doc/pub/catania/html/catania-bs.html b/doc/pub/catania/html/catania-bs.html index 838734e..fc57f03 100644 --- a/doc/pub/catania/html/catania-bs.html +++ b/doc/pub/catania/html/catania-bs.html @@ -54,6 +54,28 @@ 2, None, 'the-plethora-of-machine-learning-algorithms-methods'), + ('Extrapolations and model interpretability', + 2, + None, + 'extrapolations-and-model-interpretability'), + ('Generative and discriminative models', + 2, + None, + 'generative-and-discriminative-models'), + ('"Dilute neutron star matter from neural-network quantum states ' + 'by Fore et al, Physical Review Research 5, 033062 ' + '(2023)":"https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062" ' + 'at density $\\rho=0.04$ fm$^{-3}$', + 2, + None, + 'dilute-neutron-star-matter-from-neural-network-quantum-states-by-fore-et-al-physical-review-research-5-033062-2023-https-journals-aps-org-prresearch-pdf-10-1103-physrevresearch-5-033062-at-density-rho-0-04-fm-3'), + ('The electron gas in three dimensions with $N=14$ electrons ' + '(Wigner-Seitz radius $r_s=2$ a.u.), "Gabriel Pescia, Jane Kim ' + 'et al. ' + 'arXiv.2305.07240,":"https://doi.org/10.48550/arXiv.2305.07240"', + 2, + None, + 'the-electron-gas-in-three-dimensions-with-n-14-electrons-wigner-seitz-radius-r-s-2-a-u-gabriel-pescia-jane-kim-et-al-arxiv-2305-07240-https-doi-org-10-48550-arxiv-2305-07240'), ('What Is Generative Modeling?', 2, None, @@ -421,6 +443,10 @@
  • Machine learning. A simple perspective on the interface between ML and Physics
  • ML in Nuclear Physics (or any field in physics)
  • The plethora of machine learning algorithms/methods
  • +
  • Extrapolations and model interpretability
  • +
  • Generative and discriminative models
  • +
  • "Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023)":"https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062" at density \( \rho=0.04 \) fm$^{-3}$
  • +
  • The electron gas in three dimensions with \( N=14 \) electrons (Wigner-Seitz radius \( r_s=2 \) a.u.), "Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,":"https://doi.org/10.48550/arXiv.2305.07240"
  • What Is Generative Modeling?
  • Example of generative modeling, "taken from Generative Deep Learning by David Foster":"https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html"
  • Generative Modeling
  • @@ -648,6 +674,71 @@

    The
  • Linear and logistic regression, Kernel methods, support vector machines and more
  • Reinforcement Learning; Transfer Learning and more
  • + +

    Extrapolations and model interpretability

    + +

    When you hear phrases like predictions and estimations and +correlations and causations, what do you think of? +

    + +

    May be you think +of the difference between classifying new data points and generating +new data points. +

    + +

    Or perhaps you consider that correlations represent some kind of symmetric statements like +if \( A \) is correlated with \( B \), then \( B \) is correlated with +\( A \). Causation on the other hand is directional, that is if \( A \) causes \( B \), \( B \) does not +necessarily cause \( A \). +

    + + +

    Generative and discriminative models

    + +
    +
    + +
      +
    1. Balance between tractability and flexibility
    2. +
    3. We want to extract information about correlations, to make predictions, quantify uncertainties and express causality
    4. +
    5. How do we represent reliably our effective degrees of freedom?
    6. +
    +
    +
    + + +

    A teaser first, see next slides.

    + + +

    Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023) at density \( \rho=0.04 \) fm$^{-3}$

    + +
    +
    + +

    +
    +

    +
    +

    +
    +
    + + + +

    The electron gas in three dimensions with \( N=14 \) electrons (Wigner-Seitz radius \( r_s=2 \) a.u.), Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,

    + +
    +
    + +

    +
    +

    +
    +

    +
    +
    + +

    What Is Generative Modeling?

    @@ -2650,20 +2741,10 @@

    Diffusion learning

    Mathematics of diffusion models

    -

    Let us go back our discussions of the variational autoencoders from -last week, see -https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week15/ipynb/week15.ipynb. As -a first attempt at understanding diffusion models, we can think of -these as stacked VAEs, or better, recursive VAEs. -

    - -

    Let us try to see why. As an intermediate step, we consider so-called -hierarchical VAEs, which can be seen as a generalization of VAEs that -include multiple hierarchies of latent spaces. -

    -

    Note: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at https://arxiv.org/abs/2208.11970.

    +

    But first VAEs as an intermediate step.

    +

    Chains of VAEs

    diff --git a/doc/pub/catania/html/catania-reveal.html b/doc/pub/catania/html/catania-reveal.html index 1f4721f..13f9c16 100644 --- a/doc/pub/catania/html/catania-reveal.html +++ b/doc/pub/catania/html/catania-reveal.html @@ -266,6 +266,69 @@

    The plethora of ma +
    +

    Extrapolations and model interpretability

    + +

    When you hear phrases like predictions and estimations and +correlations and causations, what do you think of? +

    + +

    May be you think +of the difference between classifying new data points and generating +new data points. +

    + +

    Or perhaps you consider that correlations represent some kind of symmetric statements like +if \( A \) is correlated with \( B \), then \( B \) is correlated with +\( A \). Causation on the other hand is directional, that is if \( A \) causes \( B \), \( B \) does not +necessarily cause \( A \). +

    +
    + +
    +

    Generative and discriminative models

    + +
    + +

    +

      +

    1. Balance between tractability and flexibility
    2. +

    3. We want to extract information about correlations, to make predictions, quantify uncertainties and express causality
    4. +

    5. How do we represent reliably our effective degrees of freedom?
    6. +
    +
    + +

    A teaser first, see next slides.

    +
    + +
    +

    Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023) at density \( \rho=0.04 \) fm$^{-3}$

    + +
    + +

    +

    +

    +

    +
    +

    +
    +
    + +
    +

    The electron gas in three dimensions with \( N=14 \) electrons (Wigner-Seitz radius \( r_s=2 \) a.u.), Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,

    + +
    + +

    +

    +

    +

    +
    +

    +
    +
    +

    What Is Generative Modeling?

    @@ -2570,19 +2633,9 @@

    Diffusion learning

    Mathematics of diffusion models

    -

    Let us go back our discussions of the variational autoencoders from -last week, see -https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week15/ipynb/week15.ipynb. As -a first attempt at understanding diffusion models, we can think of -these as stacked VAEs, or better, recursive VAEs. -

    - -

    Let us try to see why. As an intermediate step, we consider so-called -hierarchical VAEs, which can be seen as a generalization of VAEs that -include multiple hierarchies of latent spaces. -

    -

    Note: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at https://arxiv.org/abs/2208.11970.

    + +

    But first VAEs as an intermediate step.

    diff --git a/doc/pub/catania/html/catania-solarized.html b/doc/pub/catania/html/catania-solarized.html index b642804..23f861d 100644 --- a/doc/pub/catania/html/catania-solarized.html +++ b/doc/pub/catania/html/catania-solarized.html @@ -81,6 +81,28 @@ 2, None, 'the-plethora-of-machine-learning-algorithms-methods'), + ('Extrapolations and model interpretability', + 2, + None, + 'extrapolations-and-model-interpretability'), + ('Generative and discriminative models', + 2, + None, + 'generative-and-discriminative-models'), + ('"Dilute neutron star matter from neural-network quantum states ' + 'by Fore et al, Physical Review Research 5, 033062 ' + '(2023)":"https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062" ' + 'at density $\\rho=0.04$ fm$^{-3}$', + 2, + None, + 'dilute-neutron-star-matter-from-neural-network-quantum-states-by-fore-et-al-physical-review-research-5-033062-2023-https-journals-aps-org-prresearch-pdf-10-1103-physrevresearch-5-033062-at-density-rho-0-04-fm-3'), + ('The electron gas in three dimensions with $N=14$ electrons ' + '(Wigner-Seitz radius $r_s=2$ a.u.), "Gabriel Pescia, Jane Kim ' + 'et al. ' + 'arXiv.2305.07240,":"https://doi.org/10.48550/arXiv.2305.07240"', + 2, + None, + 'the-electron-gas-in-three-dimensions-with-n-14-electrons-wigner-seitz-radius-r-s-2-a-u-gabriel-pescia-jane-kim-et-al-arxiv-2305-07240-https-doi-org-10-48550-arxiv-2305-07240'), ('What Is Generative Modeling?', 2, None, @@ -515,6 +537,68 @@

    The plethora of ma
  • Linear and logistic regression, Kernel methods, support vector machines and more
  • Reinforcement Learning; Transfer Learning and more
  • +









    +

    Extrapolations and model interpretability

    + +

    When you hear phrases like predictions and estimations and +correlations and causations, what do you think of? +

    + +

    May be you think +of the difference between classifying new data points and generating +new data points. +

    + +

    Or perhaps you consider that correlations represent some kind of symmetric statements like +if \( A \) is correlated with \( B \), then \( B \) is correlated with +\( A \). Causation on the other hand is directional, that is if \( A \) causes \( B \), \( B \) does not +necessarily cause \( A \). +

    + +









    +

    Generative and discriminative models

    + +
    + +

    +

      +
    1. Balance between tractability and flexibility
    2. +
    3. We want to extract information about correlations, to make predictions, quantify uncertainties and express causality
    4. +
    5. How do we represent reliably our effective degrees of freedom?
    6. +
    +
    + + +

    A teaser first, see next slides.

    + +









    +

    Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023) at density \( \rho=0.04 \) fm$^{-3}$

    + +
    + +

    +

    +

    +

    +
    +

    +
    + + +









    +

    The electron gas in three dimensions with \( N=14 \) electrons (Wigner-Seitz radius \( r_s=2 \) a.u.), Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,

    + +
    + +

    +

    +

    +

    +
    +

    +
    + +









    What Is Generative Modeling?

    @@ -2507,20 +2591,10 @@

    Diffusion learning











    Mathematics of diffusion models

    -

    Let us go back our discussions of the variational autoencoders from -last week, see -https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week15/ipynb/week15.ipynb. As -a first attempt at understanding diffusion models, we can think of -these as stacked VAEs, or better, recursive VAEs. -

    - -

    Let us try to see why. As an intermediate step, we consider so-called -hierarchical VAEs, which can be seen as a generalization of VAEs that -include multiple hierarchies of latent spaces. -

    -

    Note: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at https://arxiv.org/abs/2208.11970.

    +

    But first VAEs as an intermediate step.

    +









    Chains of VAEs

    diff --git a/doc/pub/catania/html/catania.html b/doc/pub/catania/html/catania.html index f7bf2e5..02bfd7b 100644 --- a/doc/pub/catania/html/catania.html +++ b/doc/pub/catania/html/catania.html @@ -158,6 +158,28 @@ 2, None, 'the-plethora-of-machine-learning-algorithms-methods'), + ('Extrapolations and model interpretability', + 2, + None, + 'extrapolations-and-model-interpretability'), + ('Generative and discriminative models', + 2, + None, + 'generative-and-discriminative-models'), + ('"Dilute neutron star matter from neural-network quantum states ' + 'by Fore et al, Physical Review Research 5, 033062 ' + '(2023)":"https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062" ' + 'at density $\\rho=0.04$ fm$^{-3}$', + 2, + None, + 'dilute-neutron-star-matter-from-neural-network-quantum-states-by-fore-et-al-physical-review-research-5-033062-2023-https-journals-aps-org-prresearch-pdf-10-1103-physrevresearch-5-033062-at-density-rho-0-04-fm-3'), + ('The electron gas in three dimensions with $N=14$ electrons ' + '(Wigner-Seitz radius $r_s=2$ a.u.), "Gabriel Pescia, Jane Kim ' + 'et al. ' + 'arXiv.2305.07240,":"https://doi.org/10.48550/arXiv.2305.07240"', + 2, + None, + 'the-electron-gas-in-three-dimensions-with-n-14-electrons-wigner-seitz-radius-r-s-2-a-u-gabriel-pescia-jane-kim-et-al-arxiv-2305-07240-https-doi-org-10-48550-arxiv-2305-07240'), ('What Is Generative Modeling?', 2, None, @@ -592,6 +614,68 @@

    The plethora of ma
  • Linear and logistic regression, Kernel methods, support vector machines and more
  • Reinforcement Learning; Transfer Learning and more
  • +









    +

    Extrapolations and model interpretability

    + +

    When you hear phrases like predictions and estimations and +correlations and causations, what do you think of? +

    + +

    May be you think +of the difference between classifying new data points and generating +new data points. +

    + +

    Or perhaps you consider that correlations represent some kind of symmetric statements like +if \( A \) is correlated with \( B \), then \( B \) is correlated with +\( A \). Causation on the other hand is directional, that is if \( A \) causes \( B \), \( B \) does not +necessarily cause \( A \). +

    + +









    +

    Generative and discriminative models

    + +
    + +

    +

      +
    1. Balance between tractability and flexibility
    2. +
    3. We want to extract information about correlations, to make predictions, quantify uncertainties and express causality
    4. +
    5. How do we represent reliably our effective degrees of freedom?
    6. +
    +
    + + +

    A teaser first, see next slides.

    + +









    +

    Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023) at density \( \rho=0.04 \) fm$^{-3}$

    + +
    + +

    +

    +

    +

    +
    +

    +
    + + +









    +

    The electron gas in three dimensions with \( N=14 \) electrons (Wigner-Seitz radius \( r_s=2 \) a.u.), Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,

    + +
    + +

    +

    +

    +

    +
    +

    +
    + +









    What Is Generative Modeling?

    @@ -2584,20 +2668,10 @@

    Diffusion learning











    Mathematics of diffusion models

    -

    Let us go back our discussions of the variational autoencoders from -last week, see -https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week15/ipynb/week15.ipynb. As -a first attempt at understanding diffusion models, we can think of -these as stacked VAEs, or better, recursive VAEs. -

    - -

    Let us try to see why. As an intermediate step, we consider so-called -hierarchical VAEs, which can be seen as a generalization of VAEs that -include multiple hierarchies of latent spaces. -

    -

    Note: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at https://arxiv.org/abs/2208.11970.

    +

    But first VAEs as an intermediate step.

    +









    Chains of VAEs

    diff --git a/doc/pub/catania/ipynb/.ipynb_checkpoints/catania-checkpoint.ipynb b/doc/pub/catania/ipynb/.ipynb_checkpoints/catania-checkpoint.ipynb new file mode 100644 index 0000000..5cb06bc --- /dev/null +++ b/doc/pub/catania/ipynb/.ipynb_checkpoints/catania-checkpoint.ipynb @@ -0,0 +1,4178 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "153acda9", + "metadata": {}, + "source": [ + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "98222977", + "metadata": {}, + "source": [ + "# Mathematics of discriminative and generative deep learning, from deep neural networks to diffusion models\n", + "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University, East Lansing, Michigan, USA\n", + "\n", + "Date: **Dipartimento di Fisica e Astronomia, 14 giugno, Uni-Catania, 2024**" + ] + }, + { + "cell_type": "markdown", + "id": "ea720441", + "metadata": {}, + "source": [ + "## Types of machine learning\n", + "\n", + "The approaches to machine learning are many, but are often split into two main categories. \n", + "In *supervised learning* we know the answer to a problem,\n", + "and let the computer deduce the logic behind it. On the other hand, *unsupervised learning*\n", + "is a method for finding patterns and relationship in data sets without any prior knowledge of the system.\n", + "\n", + "An emerging third category is *reinforcement learning*. This is a paradigm \n", + "of learning inspired by behavioural psychology, where learning is achieved by trial-and-error, \n", + "solely from rewards and punishment." + ] + }, + { + "cell_type": "markdown", + "id": "be52cf0a", + "metadata": {}, + "source": [ + "## Main categories\n", + "Another way to categorize machine learning tasks is to consider the desired output of a system.\n", + "Some of the most common tasks are:\n", + "\n", + " * Classification: Outputs are divided into two or more classes. The goal is to produce a model that assigns inputs into one of these classes. An example is to identify digits based on pictures of hand-written ones. Classification is typically supervised learning.\n", + "\n", + " * Regression: Finding a functional relationship between an input data set and a reference data set. The goal is to construct a function that maps input data to continuous output values.\n", + "\n", + " * Clustering: Data are divided into groups with certain common traits, without knowing the different groups beforehand. It is thus a form of unsupervised learning." + ] + }, + { + "cell_type": "markdown", + "id": "8ec51b16", + "metadata": {}, + "source": [ + "## Machine learning. A simple perspective on the interface between ML and Physics\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "9e9be934", + "metadata": {}, + "source": [ + "## ML in Nuclear Physics (or any field in physics)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "e95c9d02", + "metadata": {}, + "source": [ + "## The plethora of machine learning algorithms/methods\n", + "\n", + "1. Deep learning: Neural Networks (NN), Convolutional NN, Recurrent NN, Boltzmann machines, autoencoders and variational autoencoders and generative adversarial networks, stable diffusion and many more generative models\n", + "\n", + "2. Bayesian statistics and Bayesian Machine Learning, Bayesian experimental design, Bayesian Regression models, Bayesian neural networks, Gaussian processes and much more\n", + "\n", + "3. Dimensionality reduction (Principal component analysis), Clustering Methods and more\n", + "\n", + "4. Ensemble Methods, Random forests, bagging and voting methods, gradient boosting approaches \n", + "\n", + "5. Linear and logistic regression, Kernel methods, support vector machines and more\n", + "\n", + "6. Reinforcement Learning; Transfer Learning and more" + ] + }, + { + "cell_type": "markdown", + "id": "98ad0ecb", + "metadata": {}, + "source": [ + "## What Is Generative Modeling?\n", + "\n", + "Generative modeling can be broadly defined as follows:\n", + "\n", + "Generative modeling is a branch of machine learning that involves\n", + "training a model to produce new data that is similar to a given\n", + "dataset.\n", + "\n", + "What does this mean in practice? Suppose we have a dataset containing\n", + "photos of horses. We can train a generative model on this dataset to\n", + "capture the rules that govern the complex relationships between pixels\n", + "in images of horses. Then we can sample from this model to create\n", + "novel, realistic images of horses that did not exist in the original\n", + "dataset." + ] + }, + { + "cell_type": "markdown", + "id": "6fbf4339", + "metadata": {}, + "source": [ + "## Example of generative modeling, [taken from Generative Deep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "152639af", + "metadata": {}, + "source": [ + "## Generative Modeling\n", + "\n", + "In order to build a generative model, we require a dataset consisting\n", + "of many examples of the entity we are trying to generate. This is\n", + "known as the training data, and one such data point is called an\n", + "observation.\n", + "\n", + "Each observation consists of many features. For an image generation\n", + "problem, the features are usually the individual pixel values; for a\n", + "text generation problem, the features could be individual words or\n", + "groups of letters. It is our goal to build a model that can generate\n", + "new sets of features that look as if they have been created using the\n", + "same rules as the original data. Conceptually, for image generation\n", + "this is an incredibly difficult task, considering the vast number of\n", + "ways that individual pixel values can be assigned and the relatively\n", + "tiny number of such arrangements that constitute an image of the\n", + "entity we are trying to generate." + ] + }, + { + "cell_type": "markdown", + "id": "cbbea082", + "metadata": {}, + "source": [ + "## Generative Versus Discriminative Modeling\n", + "\n", + "In order to truly understand what generative modeling aims to achieve\n", + "and why this is important, it is useful to compare it to its\n", + "counterpart, discriminative modeling. If you have studied machine\n", + "learning, most problems you will have faced will have most likely been\n", + "discriminative in nature." + ] + }, + { + "cell_type": "markdown", + "id": "b8696a2c", + "metadata": {}, + "source": [ + "## Example of discriminative modeling, [taken from Generative Deeep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "6ac65ba4", + "metadata": {}, + "source": [ + "## Discriminative Modeling\n", + "\n", + "When performing discriminative modeling, each observation in the\n", + "training data has a label. For a binary classification problem such as\n", + "our data could be labeled as ones and zeros. Our model then learns how to\n", + "discriminate between these two groups and outputs the probability that\n", + "a new observation has label 1 or 0\n", + "\n", + "In contrast, generative modeling doesn’t require the dataset to be\n", + "labeled because it concerns itself with generating entirely new\n", + "data (for example an image), rather than trying to predict a label for say a given image." + ] + }, + { + "cell_type": "markdown", + "id": "2253c0c3", + "metadata": {}, + "source": [ + "## Taxonomy of generative deep learning, [taken from Generative Deep Learning by David Foster](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "42a82617", + "metadata": {}, + "source": [ + "## Good books with hands-on material and codes\n", + "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)\n", + "\n", + "* [David Foster, Generative Deep Learning with TensorFlow](https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch01.html)\n", + "\n", + "* [Babcock and Gavras, Generative AI with Python and TensorFlow 2](https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2)\n", + "\n", + "All three books have GitHub sites from where one can download all codes. A good and more general text (2016)\n", + "is Goodfellow, Bengio and Courville, [Deep Learning](https://www.deeplearningbook.org/)" + ] + }, + { + "cell_type": "markdown", + "id": "6decfc58", + "metadata": {}, + "source": [ + "## More references\n", + "\n", + "**Reading on diffusion models.**\n", + "\n", + "1. A central paper is the one by Sohl-Dickstein et al, Deep Unsupervised Learning using Nonequilibrium Thermodynamics, \n", + "\n", + "2. See also Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho, Variational Diffusion Models, \n", + "\n", + " \n", + "\n", + "**and VAEs.**\n", + "\n", + "1. Calvin Luo \n", + "\n", + "2. An Introduction to Variational Autoencoders, by Kingma and Welling, see " + ] + }, + { + "cell_type": "markdown", + "id": "a61b4604", + "metadata": {}, + "source": [ + "## What are the basic Machine Learning ingredients?\n", + "Almost every problem in ML and data science starts with the same ingredients:\n", + "* The dataset $\\boldsymbol{x}$ (could be some observable quantity of the system we are studying)\n", + "\n", + "* A model which is a function of a set of parameters $\\boldsymbol{\\alpha}$ that relates to the dataset, say a likelihood function $p(\\boldsymbol{x}\\vert \\boldsymbol{\\alpha})$ or just a simple model $f(\\boldsymbol{\\alpha})$\n", + "\n", + "* A so-called **loss/cost/risk** function $\\mathcal{C} (\\boldsymbol{x}, f(\\boldsymbol{\\alpha}))$ which allows us to decide how well our model represents the dataset. \n", + "\n", + "We seek to minimize the function $\\mathcal{C} (\\boldsymbol{x}, f(\\boldsymbol{\\alpha}))$ by finding the parameter values which minimize $\\mathcal{C}$. This leads to various minimization algorithms. It may surprise many, but at the heart of all machine learning algortihms there is an optimization problem." + ] + }, + { + "cell_type": "markdown", + "id": "71c70af5", + "metadata": {}, + "source": [ + "## Low-level machine learning, the family of ordinary least squares methods\n", + "\n", + "Our data which we want to apply a machine learning method on, consist\n", + "of a set of inputs $\\boldsymbol{x}^T=[x_0,x_1,x_2,\\dots,x_{n-1}]$ and the\n", + "outputs we want to model $\\boldsymbol{y}^T=[y_0,y_1,y_2,\\dots,y_{n-1}]$.\n", + "We assume that the output data can be represented (for a regression case) by a continuous function $f$\n", + "through" + ] + }, + { + "cell_type": "markdown", + "id": "5f3d8268", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x})+\\boldsymbol{\\epsilon}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "18a2531c", + "metadata": {}, + "source": [ + "## Setting up the equations\n", + "\n", + "In linear regression we approximate the unknown function with another\n", + "continuous function $\\tilde{\\boldsymbol{y}}(\\boldsymbol{x})$ which depends linearly on\n", + "some unknown parameters\n", + "$\\boldsymbol{\\theta}^T=[\\theta_0,\\theta_1,\\theta_2,\\dots,\\theta_{p-1}]$.\n", + "\n", + "The input data can be organized in terms of a so-called design matrix \n", + "with an approximating function $\\boldsymbol{\\tilde{y}}$" + ] + }, + { + "cell_type": "markdown", + "id": "5627463f", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{\\tilde{y}}= \\boldsymbol{X}\\boldsymbol{\\theta},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "7cf74dba", + "metadata": {}, + "source": [ + "## The objective/cost/loss function\n", + "\n", + "The simplest approach is the mean squared error" + ] + }, + { + "cell_type": "markdown", + "id": "54bf9b00", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\sum_{i=0}^{n-1}\\left(y_i-\\tilde{y}_i\\right)^2=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}}\\right)\\right\\},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b6994b98", + "metadata": {}, + "source": [ + "or using the matrix $\\boldsymbol{X}$ and in a more compact matrix-vector notation as" + ] + }, + { + "cell_type": "markdown", + "id": "92286814", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dc75b709", + "metadata": {}, + "source": [ + "This function represents one of many possible ways to define the so-called cost function." + ] + }, + { + "cell_type": "markdown", + "id": "7f7f686e", + "metadata": {}, + "source": [ + "## Training solution\n", + "\n", + "Optimizing with respect to the unknown parameters $\\theta_j$ we get" + ] + }, + { + "cell_type": "markdown", + "id": "5c1e66d4", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{X}^T\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{X}\\boldsymbol{\\theta},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10827c34", + "metadata": {}, + "source": [ + "and if the matrix $\\boldsymbol{X}^T\\boldsymbol{X}$ is invertible we have the optimal values" + ] + }, + { + "cell_type": "markdown", + "id": "0fc58145", + "metadata": {}, + "source": [ + "$$\n", + "\\hat{\\boldsymbol{\\theta}} =\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "19f43080", + "metadata": {}, + "source": [ + "We say we 'learn' the unknown parameters $\\boldsymbol{\\theta}$ from the last equation." + ] + }, + { + "cell_type": "markdown", + "id": "0b2486f6", + "metadata": {}, + "source": [ + "## Ridge and LASSO Regression\n", + "\n", + "Our optimization problem is" + ] + }, + { + "cell_type": "markdown", + "id": "ec1496ee", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in {\\mathbb{R}}^{p}}}\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "965d89d3", + "metadata": {}, + "source": [ + "or we can state it as" + ] + }, + { + "cell_type": "markdown", + "id": "3ad09b32", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n", + "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\sum_{i=0}^{n-1}\\left(y_i-\\tilde{y}_i\\right)^2=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\vert\\vert_2^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "373ea912", + "metadata": {}, + "source": [ + "where we have used the definition of a norm-2 vector, that is" + ] + }, + { + "cell_type": "markdown", + "id": "d33132ee", + "metadata": {}, + "source": [ + "$$\n", + "\\vert\\vert \\boldsymbol{x}\\vert\\vert_2 = \\sqrt{\\sum_i x_i^2}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b7b2eeb0", + "metadata": {}, + "source": [ + "## From OLS to Ridge and Lasso\n", + "\n", + "By minimizing the above equation with respect to the parameters\n", + "$\\boldsymbol{\\theta}$ we could then obtain an analytical expression for the\n", + "parameters $\\boldsymbol{\\theta}$. We can add a regularization parameter $\\lambda$ by\n", + "defining a new cost function to be optimized, that is" + ] + }, + { + "cell_type": "markdown", + "id": "f15109cc", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n", + "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_2^2\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e28f51b3", + "metadata": {}, + "source": [ + "which leads to the Ridge regression minimization problem where we\n", + "require that $\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_2^2\\le t$, where $t$ is\n", + "a finite number larger than zero. We do not include such a constraints in the discussions here." + ] + }, + { + "cell_type": "markdown", + "id": "aa47497d", + "metadata": {}, + "source": [ + "## Lasso regression\n", + "\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "479b3cf8", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\theta})=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cbfc4e96", + "metadata": {}, + "source": [ + "we have a new optimization equation" + ] + }, + { + "cell_type": "markdown", + "id": "1804d870", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\min_{\\boldsymbol{\\theta}\\in\n", + "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\theta}\\vert\\vert_1\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8665f556", + "metadata": {}, + "source": [ + "which leads to Lasso regression. Lasso stands for least absolute shrinkage and selection operator. \n", + "Here we have defined the norm-1 as" + ] + }, + { + "cell_type": "markdown", + "id": "2617c164", + "metadata": {}, + "source": [ + "$$\n", + "\\vert\\vert \\boldsymbol{x}\\vert\\vert_1 = \\sum_i \\vert x_i\\vert.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1b4e7fc9", + "metadata": {}, + "source": [ + "## Selected references\n", + "* [Mehta et al.](https://arxiv.org/abs/1803.08823) and [Physics Reports (2019)](https://www.sciencedirect.com/science/article/pii/S0370157319300766?via%3Dihub).\n", + "\n", + "* [Machine Learning and the Physical Sciences by Carleo et al](https://link.aps.org/doi/10.1103/RevModPhys.91.045002)\n", + "\n", + "* [Artificial Intelligence and Machine Learning in Nuclear Physics, Amber Boehnlein et al., Reviews Modern of Physics 94, 031003 (2022)](https://journals.aps.org/rmp/abstract/10.1103/RevModPhys.94.031003) \n", + "\n", + "* [Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023)](https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062)\n", + "\n", + "* Neural-network quantum states for ultra-cold Fermi gases, Jane Kim et al, Nature Physics Communcication, in press, see \n", + "\n", + "* [Message-Passing Neural Quantum States for the Homogeneous Electron Gas, Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,](https://doi.org/10.48550/arXiv.2305.07240)\n", + "\n", + "* [Particle Data Group summary on ML methods](https://pdg.lbl.gov/2021/reviews/rpp2021-rev-machine-learning.pdf)" + ] + }, + { + "cell_type": "markdown", + "id": "1e8c7fc1", + "metadata": {}, + "source": [ + "## Setting up the basic equations for neural networks\n", + "\n", + "Neural networks, in its so-called feed-forward form, where each\n", + "iterations contains a feed-forward stage and a back-propgagation\n", + "stage, consist of series of affine matrix-matrix and matrix-vector\n", + "multiplications. The unknown parameters (the so-called biases and\n", + "weights which deternine the architecture of a neural network), are\n", + "uptaded iteratively using the so-called back-propagation algorithm.\n", + "This algorithm corresponds to the so-called reverse mode of the\n", + "automatic differentation algorithm. These algorithms will be discussed\n", + "in more detail below.\n", + "\n", + "We start however first with the definitions of the various variables which make up a neural network." + ] + }, + { + "cell_type": "markdown", + "id": "c9a064e3", + "metadata": {}, + "source": [ + "## Overarching view of a neural network\n", + "\n", + "The architecture of a neural network defines our model. This model\n", + "aims at describing some function $f(\\boldsymbol{x}$ which aims at describing\n", + "some final result (outputs or tagrget values) given a specific inpput\n", + "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n", + "vectors.\n", + "\n", + "The architecture consists of\n", + "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n", + "\n", + "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n", + "\n", + "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n", + "\n", + "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n", + "\n", + "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model." + ] + }, + { + "cell_type": "markdown", + "id": "a6030cc8", + "metadata": {}, + "source": [ + "## Illustration of a single perceptron model and a multilayer FFNN\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b07f2150", + "metadata": {}, + "source": [ + "## The optimization problem\n", + "\n", + "The cost function is a function of the unknown parameters\n", + "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n", + "parameters needed to define a neural network\n", + "\n", + "If we are dealing with a regression task a typical cost/loss function\n", + "is the mean squared error" + ] + }, + { + "cell_type": "markdown", + "id": "aaaef430", + "metadata": {}, + "source": [ + "$$\n", + "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d5058a10", + "metadata": {}, + "source": [ + "This function represents one of many possible ways to define\n", + "the so-called cost function." + ] + }, + { + "cell_type": "markdown", + "id": "1cec6c48", + "metadata": {}, + "source": [ + "## Weights and biases\n", + "\n", + "For neural networks the parameters\n", + "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n", + "defined below).\n", + "\n", + "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n", + "superscript indicates the layer number. The biases are typically given\n", + "by vector elements representing each single node of a given layer,\n", + "that is $b_j^{(l)}$." + ] + }, + { + "cell_type": "markdown", + "id": "6922ddc2", + "metadata": {}, + "source": [ + "## Other ingredients of a neural network\n", + "\n", + "Having defined the architecture of a neural network, the optimization\n", + "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n", + "involves the calculations of gradients and their optimization. The\n", + "gradients represent the derivatives of a multidimensional object and\n", + "are often approximated by various gradient methods, including\n", + "1. various quasi-Newton methods,\n", + "\n", + "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n", + "\n", + "3. GD with momentum and other approximations to the learning rates such as\n", + "\n", + " * Adapative gradient (ADAgrad)\n", + "\n", + " * Root mean-square propagation (RMSprop)\n", + "\n", + " * Adaptive gradient with momentum (ADAM) and many other\n", + "\n", + "4. Stochastic gradient descent and various families of learning rate approximations" + ] + }, + { + "cell_type": "markdown", + "id": "468855ba", + "metadata": {}, + "source": [ + "## Other parameters\n", + "\n", + "In addition to the above, there are often additional hyperparamaters\n", + "which are included in the setup of a neural network. These will be\n", + "discussed below." + ] + }, + { + "cell_type": "markdown", + "id": "5741b4de", + "metadata": {}, + "source": [ + "## Why Feed Forward Neural Networks (FFNN)?\n", + "\n", + "According to the *Universal approximation theorem*, a feed-forward\n", + "neural network with just a single hidden layer containing a finite\n", + "number of neurons can approximate a continuous multidimensional\n", + "function to arbitrary accuracy, assuming the activation function for\n", + "the hidden layer is a **non-constant, bounded and\n", + "monotonically-increasing continuous function**." + ] + }, + { + "cell_type": "markdown", + "id": "8dbcb2ac", + "metadata": {}, + "source": [ + "## Universal approximation theorem\n", + "\n", + "The universal approximation theorem plays a central role in deep\n", + "learning. [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n", + "the following:\n", + "\n", + "Let $\\sigma$ be any continuous sigmoidal function such that" + ] + }, + { + "cell_type": "markdown", + "id": "92ef2acd", + "metadata": {}, + "source": [ + "$$\n", + "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c3144610", + "metadata": {}, + "source": [ + "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n", + "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n", + "$\\epsilon >0$, there is a one-layer (hidden) neural network\n", + "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n", + "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which" + ] + }, + { + "cell_type": "markdown", + "id": "737eecdd", + "metadata": {}, + "source": [ + "$$\n", + "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3ea34391", + "metadata": {}, + "source": [ + "## The approximation theorem in words\n", + "\n", + "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n", + "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n", + "arbitrary accuracy.**\n", + "\n", + "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value" + ] + }, + { + "cell_type": "markdown", + "id": "083c5bb4", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f8120236", + "metadata": {}, + "source": [ + "Then we have" + ] + }, + { + "cell_type": "markdown", + "id": "25901a92", + "metadata": {}, + "source": [ + "$$\n", + "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c8d0a8c9", + "metadata": {}, + "source": [ + "## More on the general approximation theorem\n", + "\n", + "None of the proofs give any insight into the relation between the\n", + "number of of hidden layers and nodes and the approximation error\n", + "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n", + "\n", + "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n", + "\n", + "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want." + ] + }, + { + "cell_type": "markdown", + "id": "0dcf9e9c", + "metadata": {}, + "source": [ + "## Class of functions we can approximate\n", + "\n", + "The class of functions that can be approximated are the continuous ones.\n", + "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points." + ] + }, + { + "cell_type": "markdown", + "id": "649cfac1", + "metadata": {}, + "source": [ + "## Simple example, fitting nuclear masses\n", + "\n", + "See example at , and scroll down to nuclear masses.\n", + "\n", + "And the recent article " + ] + }, + { + "cell_type": "markdown", + "id": "c521b370", + "metadata": {}, + "source": [ + "## First network example, simple percepetron with one input\n", + "\n", + "As yet another example we define now a simple perceptron model with\n", + "all quantities given by scalars. We consider only one input variable\n", + "$x$ and one target value $y$. We define an activation function\n", + "$\\sigma_1$ which takes as input" + ] + }, + { + "cell_type": "markdown", + "id": "44f11300", + "metadata": {}, + "source": [ + "$$\n", + "z_1 = w_1x+b_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ffb62a75", + "metadata": {}, + "source": [ + "where $w_1$ is the weight and $b_1$ is the bias. These are the\n", + "parameters we want to optimize. The output is $a_1=\\sigma(z_1)$ (see\n", + "graph from whiteboard notes). This output is then fed into the\n", + "**cost/loss** function, which we here for the sake of simplicity just\n", + "define as the squared error" + ] + }, + { + "cell_type": "markdown", + "id": "318c7a3c", + "metadata": {}, + "source": [ + "$$\n", + "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eb6d94c6", + "metadata": {}, + "source": [ + "## Optimizing the parameters\n", + "\n", + "In setting up the feed forward and back propagation parts of the\n", + "algorithm, we need now the derivative of the various variables we want\n", + "to train.\n", + "\n", + "We need" + ] + }, + { + "cell_type": "markdown", + "id": "0c0b217c", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "05e66932", + "metadata": {}, + "source": [ + "Using the chain rule we find" + ] + }, + { + "cell_type": "markdown", + "id": "c3862f3e", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "dfbb2efd", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "548a0d3e", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e66ce860", + "metadata": {}, + "source": [ + "which we later will just define as" + ] + }, + { + "cell_type": "markdown", + "id": "cc72ed6b", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "42473f58", + "metadata": {}, + "source": [ + "## Implementing the simple perceptron model\n", + "\n", + "In the example code here we implement the above equations (with explict\n", + "expressions for the derivatives) with just one input variable $x$ and\n", + "one output variable. The target value $y=2x+1$ is a simple linear\n", + "function in $x$. Since this is a regression problem, we define the cost function to be proportional to the least squares error" + ] + }, + { + "cell_type": "markdown", + "id": "357226b9", + "metadata": {}, + "source": [ + "$$\n", + "C(y,w_1,b_1)=\\frac{1}{2}(a_1-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b379b0b0", + "metadata": {}, + "source": [ + "with $a_1$ the output from the network." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "6d3d7bc7", + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "# import necessary packages\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def feed_forward(x):\n", + " # weighted sum of inputs to the output layer\n", + " z_1 = x*output_weights + output_bias\n", + " # Output from output node (one node only)\n", + " # Here the output is equal to the input\n", + " a_1 = z_1\n", + " return a_1\n", + "\n", + "def backpropagation(x, y):\n", + " a_1 = feed_forward(x)\n", + " # derivative of cost function\n", + " derivative_cost = a_1 - y\n", + " # the variable delta in the equations, note that output a_1 = z_1, its derivatives wrt z_o is thus 1\n", + " delta_1 = derivative_cost\n", + " # gradients for the output layer\n", + " output_weights_gradient = delta_1*x\n", + " output_bias_gradient = delta_1\n", + " # The cost function is 0.5*(a_1-y)^2. This gives a measure of the error for each iteration\n", + " return output_weights_gradient, output_bias_gradient\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = 4.0\n", + "# Target values\n", + "y = 2*x+1.0\n", + "\n", + "# Defining the neural network\n", + "n_inputs = 1\n", + "n_outputs = 1\n", + "# Initialize the network\n", + "# weights and bias in the output layer\n", + "output_weights = np.random.randn()\n", + "output_bias = np.random.randn()\n", + "\n", + "# implementing a simple gradient descent approach with fixed learning rate\n", + "eta = 0.01\n", + "for i in range(40):\n", + " # calculate gradients from back propagation\n", + " derivative_w1, derivative_b1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " output_weights -= eta * derivative_w1\n", + " output_bias -= eta * derivative_b1\n", + "# our final prediction after training\n", + "ytilde = output_weights*x+output_bias\n", + "print(0.5*((ytilde-y)**2))" + ] + }, + { + "cell_type": "markdown", + "id": "488724fe", + "metadata": {}, + "source": [ + "Running this code gives us an acceptable results after some 40-50 iterations. Note that the results depend on the value of the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "3a9a55b4", + "metadata": {}, + "source": [ + "## Exercise 1: Extensions to the above code\n", + "\n", + "Feel free to add more input nodes and weights to the above\n", + "code. Furthermore, try to increase the amount of input and\n", + "target/output data. Try also to perform calculations for more values\n", + "of the learning rates. Feel free to add either hyperparameters with an\n", + "$l_1$ norm or an $l_2$ norm and discuss your results.\n", + "\n", + "You could also try to change the function $f(x)=y$ from a linear polynomial in $x$ to a higher-order polynomial.\n", + "Comment your results.\n", + "\n", + "**Hint**: Increasing the number of input variables and input nodes requires a rewrite of the input data in terms of a matrix. You need to figure out the correct dimensionalities." + ] + }, + { + "cell_type": "markdown", + "id": "ba544f79", + "metadata": {}, + "source": [ + "## Adding a hidden layer\n", + "\n", + "We change our simple model to (see graph below)\n", + "a network with just one hidden layer but with scalar variables only.\n", + "\n", + "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n", + "We have then" + ] + }, + { + "cell_type": "markdown", + "id": "6a57d7cb", + "metadata": {}, + "source": [ + "$$\n", + "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b86d6a5c", + "metadata": {}, + "source": [ + "$$\n", + "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9a69d392", + "metadata": {}, + "source": [ + "and the cost function" + ] + }, + { + "cell_type": "markdown", + "id": "b4ec9b75", + "metadata": {}, + "source": [ + "$$\n", + "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9d2e3667", + "metadata": {}, + "source": [ + "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$." + ] + }, + { + "cell_type": "markdown", + "id": "8d3b6020", + "metadata": {}, + "source": [ + "## The derivatives\n", + "\n", + "The derivatives are now, using the chain rule again" + ] + }, + { + "cell_type": "markdown", + "id": "76ba0da7", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "75627339", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "72c07ea7", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b7fcb9c9", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b38f4012", + "metadata": {}, + "source": [ + "Can you generalize this to more than one hidden layer?" + ] + }, + { + "cell_type": "markdown", + "id": "6bc73d7d", + "metadata": {}, + "source": [ + "## Important observations\n", + "\n", + "From the above equations we see that the derivatives of the activation\n", + "functions play a central role. If they vanish, the training may\n", + "stop. This is called the vanishing gradient problem, see discussions below. If they become\n", + "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n", + "is referenced as the exploding gradient problem." + ] + }, + { + "cell_type": "markdown", + "id": "5e908359", + "metadata": {}, + "source": [ + "## The training\n", + "\n", + "The training of the parameters is done through various gradient descent approximations with" + ] + }, + { + "cell_type": "markdown", + "id": "c0dcfd84", + "metadata": {}, + "source": [ + "$$\n", + "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8ae30522", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "6fcf91f5", + "metadata": {}, + "source": [ + "$$\n", + "b_i \\leftarrow b_i-\\eta \\delta_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "35b52b3e", + "metadata": {}, + "source": [ + "with $\\eta$ is the learning rate.\n", + "\n", + "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n", + "\n", + "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model." + ] + }, + { + "cell_type": "markdown", + "id": "4ddb7234", + "metadata": {}, + "source": [ + "## Code example\n", + "\n", + "The code here implements the above model with one hidden layer and\n", + "scalar variables for the same function we studied in the previous\n", + "example. The code is however set up so that we can add multiple\n", + "inputs $x$ and target values $y$. Note also that we have the\n", + "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n", + "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2e72d589", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "# We use the Sigmoid function as activation function\n", + "def sigmoid(z):\n", + " return 1.0/(1.0+np.exp(-z))\n", + "\n", + "def forwardpropagation(x):\n", + " # weighted sum of inputs to the hidden layer\n", + " z_1 = np.matmul(x, w_1) + b_1\n", + " # activation in the hidden layer\n", + " a_1 = sigmoid(z_1)\n", + " # weighted sum of inputs to the output layer\n", + " z_2 = np.matmul(a_1, w_2) + b_2\n", + " a_2 = z_2\n", + " return a_1, a_2\n", + "\n", + "def backpropagation(x, y):\n", + " a_1, a_2 = forwardpropagation(x)\n", + " # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n", + " delta_2 = a_2 - y\n", + " print(0.5*((a_2-y)**2))\n", + " # delta for the hidden layer\n", + " delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n", + " # gradients for the output layer\n", + " output_weights_gradient = np.matmul(a_1.T, delta_2)\n", + " output_bias_gradient = np.sum(delta_2, axis=0)\n", + " # gradient for the hidden layer\n", + " hidden_weights_gradient = np.matmul(x.T, delta_1)\n", + " hidden_bias_gradient = np.sum(delta_1, axis=0)\n", + " return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n", + "\n", + "\n", + "# ensure the same random numbers appear every time\n", + "np.random.seed(0)\n", + "# Input variable\n", + "x = np.array([4.0],dtype=np.float64)\n", + "# Target values\n", + "y = 2*x+1.0 \n", + "\n", + "# Defining the neural network, only scalars here\n", + "n_inputs = x.shape\n", + "n_features = 1\n", + "n_hidden_neurons = 1\n", + "n_outputs = 1\n", + "\n", + "# Initialize the network\n", + "# weights and bias in the hidden layer\n", + "w_1 = np.random.randn(n_features, n_hidden_neurons)\n", + "b_1 = np.zeros(n_hidden_neurons) + 0.01\n", + "\n", + "# weights and bias in the output layer\n", + "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n", + "b_2 = np.zeros(n_outputs) + 0.01\n", + "\n", + "eta = 0.1\n", + "for i in range(50):\n", + " # calculate gradients\n", + " derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n", + " # update weights and biases\n", + " w_2 -= eta * derivW2\n", + " b_2 -= eta * derivB2\n", + " w_1 -= eta * derivW1\n", + " b_1 -= eta * derivB1" + ] + }, + { + "cell_type": "markdown", + "id": "32b998f9", + "metadata": {}, + "source": [ + "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small." + ] + }, + { + "cell_type": "markdown", + "id": "539faf33", + "metadata": {}, + "source": [ + "## Exercise 2: Including more data\n", + "\n", + "Try to increase the amount of input and\n", + "target/output data. Try also to perform calculations for more values\n", + "of the learning rates. Feel free to add either hyperparameters with an\n", + "$l_1$ norm or an $l_2$ norm and discuss your results.\n", + "Discuss your results as functions of the amount of training data and various learning rates.\n", + "\n", + "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**." + ] + }, + { + "cell_type": "markdown", + "id": "3d544ddd", + "metadata": {}, + "source": [ + "## Simple neural network and the back propagation equations\n", + "\n", + "Let us now try to increase our level of ambition and attempt at setting \n", + "up the equations for a neural network with two input nodes, one hidden\n", + "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n", + "\n", + "We need to define the following parameters and variables with the input layer (layer $(0)$) \n", + "where we label the nodes $x_0$ and $x_1$" + ] + }, + { + "cell_type": "markdown", + "id": "746b5284", + "metadata": {}, + "source": [ + "$$\n", + "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1696958f", + "metadata": {}, + "source": [ + "The hidden layer (layer $(1)$) has nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters" + ] + }, + { + "cell_type": "markdown", + "id": "f2ba678d", + "metadata": {}, + "source": [ + "$$\n", + "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eea5ba5", + "metadata": {}, + "source": [ + "## The ouput layer\n", + "\n", + "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables" + ] + }, + { + "cell_type": "markdown", + "id": "0d307a0a", + "metadata": {}, + "source": [ + "$$\n", + "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "6e08427d", + "metadata": {}, + "source": [ + "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n", + "The parameters we need to optimize are given by" + ] + }, + { + "cell_type": "markdown", + "id": "bc0d4e8c", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "80fe16aa", + "metadata": {}, + "source": [ + "## Compact expressions\n", + "\n", + "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n", + "The inputs to the first hidden layer are" + ] + }, + { + "cell_type": "markdown", + "id": "8e3b41ed", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "24a47c29", + "metadata": {}, + "source": [ + "with outputs" + ] + }, + { + "cell_type": "markdown", + "id": "d6ecace5", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "78a2d508", + "metadata": {}, + "source": [ + "## Output layer\n", + "\n", + "For the final output layer we have the inputs to the final activation function" + ] + }, + { + "cell_type": "markdown", + "id": "8e573a81", + "metadata": {}, + "source": [ + "$$\n", + "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "89fed5d1", + "metadata": {}, + "source": [ + "resulting in the output" + ] + }, + { + "cell_type": "markdown", + "id": "509fe929", + "metadata": {}, + "source": [ + "$$\n", + "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a5acbe91", + "metadata": {}, + "source": [ + "## Explicit derivatives\n", + "\n", + "In total we have nine parameters which we need to train. Using the\n", + "chain rule (or just the back-propagation algorithm) we can find all\n", + "derivatives. Since we will use automatic differentiation in reverse\n", + "mode, we start with the derivatives of the cost function with respect\n", + "to the parameters of the output layer, namely" + ] + }, + { + "cell_type": "markdown", + "id": "6f7aba23", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5dabc790", + "metadata": {}, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "8ec2c06b", + "metadata": {}, + "source": [ + "$$\n", + "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "463b5820", + "metadata": {}, + "source": [ + "and finally" + ] + }, + { + "cell_type": "markdown", + "id": "acb281ab", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "09c11ccd", + "metadata": {}, + "source": [ + "## Derivatives of the hidden layer\n", + "\n", + "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)" + ] + }, + { + "cell_type": "markdown", + "id": "97e8699c", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}= \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0fee4b8a", + "metadata": {}, + "source": [ + "which, noting that" + ] + }, + { + "cell_type": "markdown", + "id": "ee97de7e", + "metadata": {}, + "source": [ + "$$\n", + "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ddd92c77", + "metadata": {}, + "source": [ + "allows us to rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "32b4c422", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f113b389", + "metadata": {}, + "source": [ + "## Final expression\n", + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "20997a37", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "4a24af39", + "metadata": {}, + "source": [ + "we have" + ] + }, + { + "cell_type": "markdown", + "id": "e80b80d7", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "259979fa", + "metadata": {}, + "source": [ + "Similarly, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "3f1794a3", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "40b3ea46", + "metadata": {}, + "source": [ + "## Completing the list\n", + "\n", + "Similarly, we find" + ] + }, + { + "cell_type": "markdown", + "id": "8436dd52", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "462e0f16", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "e1ba411b", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "91dc27cf", + "metadata": {}, + "source": [ + "where we have defined" + ] + }, + { + "cell_type": "markdown", + "id": "36be98a8", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "73aa970e", + "metadata": {}, + "source": [ + "## Final expressions for the biases of the hidden layer\n", + "\n", + "For the sake of completeness, we list the derivatives of the biases, which are" + ] + }, + { + "cell_type": "markdown", + "id": "2f9fefd0", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3daa9158", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "a1477c87", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "26998b29", + "metadata": {}, + "source": [ + "As we will see below, these expressions can be generalized in a more compact form." + ] + }, + { + "cell_type": "markdown", + "id": "55ad7d17", + "metadata": {}, + "source": [ + "## Gradient expressions\n", + "\n", + "For this specific model, with just one output node and two hidden\n", + "nodes, the gradient descent equations take the following form for output layer" + ] + }, + { + "cell_type": "markdown", + "id": "6fc2128d", + "metadata": {}, + "source": [ + "$$\n", + "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e5d7a6f", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "d3ba6619", + "metadata": {}, + "source": [ + "$$\n", + "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5d7c4a09", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "9a46bc52", + "metadata": {}, + "source": [ + "$$\n", + "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "5824f12f", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "c9ab0962", + "metadata": {}, + "source": [ + "$$\n", + "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "46580516", + "metadata": {}, + "source": [ + "where $\\eta$ is the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "80c6d25f", + "metadata": {}, + "source": [ + "## Exercise 3: Extended program\n", + "\n", + "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is" + ] + }, + { + "cell_type": "markdown", + "id": "31af2e01", + "metadata": {}, + "source": [ + "$$\n", + "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "911a4536", + "metadata": {}, + "source": [ + "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$" + ] + }, + { + "cell_type": "markdown", + "id": "15977f20", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "03d90238", + "metadata": {}, + "source": [ + "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n", + "You can extend your code to include automatic differentiation.\n", + "\n", + "With these examples, we are now ready to embark upon the writing of more a general code for neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "43326fe9", + "metadata": {}, + "source": [ + "## Getting serious, the back propagation equations for a neural network\n", + "\n", + "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n", + "\n", + "We have thus" + ] + }, + { + "cell_type": "markdown", + "id": "a3d38666", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L} = \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "855e549c", + "metadata": {}, + "source": [ + "Defining" + ] + }, + { + "cell_type": "markdown", + "id": "dc7c5791", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "26863e76", + "metadata": {}, + "source": [ + "and using the Hadamard product of two vectors we can write this as" + ] + }, + { + "cell_type": "markdown", + "id": "49ea54a9", + "metadata": {}, + "source": [ + "$$\n", + "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8cefe41f", + "metadata": {}, + "source": [ + "## Analyzing the last results\n", + "\n", + "This is an important expression. The second term on the right handside\n", + "measures how fast the cost function is changing as a function of the $j$th\n", + "output activation. If, for example, the cost function doesn't depend\n", + "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n", + "which is what we would expect. The first term on the right, measures\n", + "how fast the activation function $f$ is changing at a given activation\n", + "value $z_j^L$." + ] + }, + { + "cell_type": "markdown", + "id": "b92b6a6b", + "metadata": {}, + "source": [ + "## More considerations\n", + "\n", + "Notice that everything in the above equations is easily computed. In\n", + "particular, we compute $z_j^L$ while computing the behaviour of the\n", + "network, and it is only a small additional overhead to compute\n", + "$\\sigma'(z^L_j)$. The exact form of the derivative with respect to the\n", + "output depends on the form of the cost function.\n", + "However, provided the cost function is known there should be little\n", + "trouble in calculating" + ] + }, + { + "cell_type": "markdown", + "id": "81d10955", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "13f5caf9", + "metadata": {}, + "source": [ + "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely" + ] + }, + { + "cell_type": "markdown", + "id": "f1569d1c", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df375f58", + "metadata": {}, + "source": [ + "## Derivatives in terms of $z_j^L$\n", + "\n", + "It is also easy to see that our previous equation can be written as" + ] + }, + { + "cell_type": "markdown", + "id": "fd8b7734", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0f5b6cf9", + "metadata": {}, + "source": [ + "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely" + ] + }, + { + "cell_type": "markdown", + "id": "e8bc5adf", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d2355495", + "metadata": {}, + "source": [ + "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias." + ] + }, + { + "cell_type": "markdown", + "id": "9e1076d2", + "metadata": {}, + "source": [ + "## Bringing it together\n", + "\n", + "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are" + ] + }, + { + "cell_type": "markdown", + "id": "3602282c", + "metadata": {}, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L} = \\delta_j^La_k^{L-1},\n", + "\\label{_auto1} \\tag{1}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c0c6c034", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "72b23035", + "metadata": {}, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n", + "\\label{_auto2} \\tag{2}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "796fd960", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "bfd68302", + "metadata": {}, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{equation}\n", + "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n", + "\\label{_auto3} \\tag{3}\n", + "\\end{equation}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b89d4ce5", + "metadata": {}, + "source": [ + "## Final back propagating equation\n", + "\n", + "We have that (replacing $L$ with a general layer $l$)" + ] + }, + { + "cell_type": "markdown", + "id": "5b33e828", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3fcc8d2b", + "metadata": {}, + "source": [ + "We want to express this in terms of the equations for layer $l+1$." + ] + }, + { + "cell_type": "markdown", + "id": "99477612", + "metadata": {}, + "source": [ + "## Using the chain rule and summing over all $k$ entries\n", + "\n", + "We obtain" + ] + }, + { + "cell_type": "markdown", + "id": "8621b33f", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0a0b0036", + "metadata": {}, + "source": [ + "and recalling that" + ] + }, + { + "cell_type": "markdown", + "id": "76785d71", + "metadata": {}, + "source": [ + "$$\n", + "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "268624da", + "metadata": {}, + "source": [ + "with $M_l$ being the number of nodes in layer $l$, we obtain" + ] + }, + { + "cell_type": "markdown", + "id": "858189d8", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "8f7cbce7", + "metadata": {}, + "source": [ + "This is our final equation.\n", + "\n", + "We are now ready to set up the algorithm for back propagation and learning the weights and biases." + ] + }, + { + "cell_type": "markdown", + "id": "d5154dbc", + "metadata": {}, + "source": [ + "## Setting up the back propagation algorithm\n", + "\n", + "The four equations provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n", + "\n", + "**First**, we set up the input data $\\hat{x}$ and the activations\n", + "$\\hat{z}_1$ of the input layer and compute the activation function and\n", + "the pertinent outputs $\\hat{a}^1$.\n", + "\n", + "**Secondly**, we perform then the feed forward till we reach the output\n", + "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n", + "activation function and the pertinent outputs $\\hat{a}^l$ for\n", + "$l=1,2,3,\\dots,L$.\n", + "\n", + "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$." + ] + }, + { + "cell_type": "markdown", + "id": "6eb82dfd", + "metadata": {}, + "source": [ + "## Setting up the back propagation algorithm, part 2\n", + "\n", + "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all" + ] + }, + { + "cell_type": "markdown", + "id": "b9d0d75b", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1ac45c13", + "metadata": {}, + "source": [ + "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "8d7c5421", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "639a1d71", + "metadata": {}, + "source": [ + "## Setting up the Back propagation algorithm, part 3\n", + "\n", + "Finally, we update the weights and the biases using gradient descent\n", + "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n", + "according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "747d9ce2", + "metadata": {}, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2c075399", + "metadata": {}, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e9e56bb2", + "metadata": {}, + "source": [ + "with $\\eta$ being the learning rate." + ] + }, + { + "cell_type": "markdown", + "id": "058c194a", + "metadata": {}, + "source": [ + "## Updating the gradients\n", + "\n", + "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as" + ] + }, + { + "cell_type": "markdown", + "id": "d9b83223", + "metadata": {}, + "source": [ + "$$\n", + "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1ab0c539", + "metadata": {}, + "source": [ + "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules" + ] + }, + { + "cell_type": "markdown", + "id": "3d05c18a", + "metadata": {}, + "source": [ + "$$\n", + "w_{jk}^l\\leftarrow = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "455bb9c7", + "metadata": {}, + "source": [ + "$$\n", + "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "19840651", + "metadata": {}, + "source": [ + "## NN code\n", + "\n", + "For an OO-code in Python for a feed-forward NN, see " + ] + }, + { + "cell_type": "markdown", + "id": "68f7526a", + "metadata": {}, + "source": [ + "## Essential elements of generative models\n", + "\n", + "The aim of generative methods is to train a probability distribution $p$. The methods we will focus on are:\n", + "1. Energy based models, with the family of Boltzmann distributions as a typical example\n", + "\n", + "2. Variational autoencoders, based on our discussions on autoencoders\n", + "\n", + "3. Diffusion models\n", + "\n", + "Not included here\n", + "1. Generative adversarial networks (GANs) and\n", + "\n", + "2. Autoregressive models\n", + "\n", + "3. Normalizing flow models" + ] + }, + { + "cell_type": "markdown", + "id": "d589304f", + "metadata": {}, + "source": [ + "## Probability model\n", + "\n", + "We define a probability" + ] + }, + { + "cell_type": "markdown", + "id": "fe9cc290", + "metadata": {}, + "source": [ + "$$\n", + "p(x_i,h_j;\\boldsymbol{\\Theta}) = \\frac{f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "d5be6be2", + "metadata": {}, + "source": [ + "where $f(x_i,h_j;\\boldsymbol{\\Theta})$ is a function which we assume is larger or\n", + "equal than zero and obeys all properties required for a probability\n", + "distribution and $Z(\\boldsymbol{\\Theta})$ is a normalization constant. Inspired by\n", + "statistical mechanics, we call it often for the partition function.\n", + "It is defined as (assuming that we have discrete probability distributions)" + ] + }, + { + "cell_type": "markdown", + "id": "b41c864e", + "metadata": {}, + "source": [ + "$$\n", + "Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "87ab7583", + "metadata": {}, + "source": [ + "## Marginal and conditional probabilities\n", + "\n", + "We can in turn define the marginal probabilities" + ] + }, + { + "cell_type": "markdown", + "id": "4409ece8", + "metadata": {}, + "source": [ + "$$\n", + "p(x_i;\\boldsymbol{\\Theta}) = \\frac{\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c7cb9fd6", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "70f1f74e", + "metadata": {}, + "source": [ + "$$\n", + "p(h_i;\\boldsymbol{\\Theta}) = \\frac{\\sum_{x_i\\in \\boldsymbol{X}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "9962b72d", + "metadata": {}, + "source": [ + "## Change of notation\n", + "\n", + "**Note the change to a vector notation**. A variable like $\\boldsymbol{x}$\n", + "represents now a specific **configuration**. We can generate an infinity\n", + "of such configurations. The final partition function is then the sum\n", + "over all such possible configurations, that is" + ] + }, + { + "cell_type": "markdown", + "id": "fd30a147", + "metadata": {}, + "source": [ + "$$\n", + "Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b83bdd29", + "metadata": {}, + "source": [ + "changes to" + ] + }, + { + "cell_type": "markdown", + "id": "ce67b5a2", + "metadata": {}, + "source": [ + "$$\n", + "Z(\\boldsymbol{\\Theta})=\\sum_{\\boldsymbol{x}}\\sum_{\\boldsymbol{h}} f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "87cedee1", + "metadata": {}, + "source": [ + "If we have a binary set of variable $x_i$ and $h_j$ and $M$ values of $x_i$ and $N$ values of $h_j$ we have in total $2^M$ and $2^N$ possible $\\boldsymbol{x}$ and $\\boldsymbol{h}$ configurations, respectively.\n", + "\n", + "We see that even for the modest binary case, we can easily approach a\n", + "number of configuration which is not possible to deal with." + ] + }, + { + "cell_type": "markdown", + "id": "51b487cb", + "metadata": {}, + "source": [ + "## Optimization problem\n", + "\n", + "At the end, we are not interested in the probabilities of the hidden variables. The probability we thus want to optimize is" + ] + }, + { + "cell_type": "markdown", + "id": "3ba28bfc", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\prod_{x_i\\in \\boldsymbol{X}}p(x_i;\\boldsymbol{\\Theta})=\\prod_{x_i\\in \\boldsymbol{X}}\\left(\\frac{\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})}\\right),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "11ceae2d", + "metadata": {}, + "source": [ + "which we rewrite as" + ] + }, + { + "cell_type": "markdown", + "id": "ac8dc0bf", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\frac{1}{Z(\\boldsymbol{\\Theta})}\\prod_{x_i\\in \\boldsymbol{X}}\\left(\\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})\\right).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "48bed24e", + "metadata": {}, + "source": [ + "## Further simplifications\n", + "\n", + "We simplify further by rewriting it as" + ] + }, + { + "cell_type": "markdown", + "id": "cc1da068", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{X};\\boldsymbol{\\Theta})=\\frac{1}{Z(\\boldsymbol{\\Theta})}\\prod_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "3c83e95b", + "metadata": {}, + "source": [ + "where we used $p(x_i;\\boldsymbol{\\Theta}) = \\sum_{h_j\\in \\boldsymbol{H}}f(x_i,h_j;\\boldsymbol{\\Theta})$.\n", + "The optimization problem is then" + ] + }, + { + "cell_type": "markdown", + "id": "38f733fe", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\mathrm{arg} \\hspace{0.1cm}\\max_{\\boldsymbol{\\boldsymbol{\\Theta}}\\in {\\mathbb{R}}^{p}}} \\hspace{0.1cm}p(\\boldsymbol{X};\\boldsymbol{\\Theta}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "2e7a9158", + "metadata": {}, + "source": [ + "## Optimizing the logarithm instead\n", + "\n", + "Computing the derivatives with respect to the parameters $\\boldsymbol{\\Theta}$ is\n", + "easier (and equivalent) with taking the logarithm of the\n", + "probability. We will thus optimize" + ] + }, + { + "cell_type": "markdown", + "id": "1e30681b", + "metadata": {}, + "source": [ + "$$\n", + "{\\displaystyle \\mathrm{arg} \\hspace{0.1cm}\\max_{\\boldsymbol{\\boldsymbol{\\Theta}}\\in {\\mathbb{R}}^{p}}} \\hspace{0.1cm}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "a6b4d14c", + "metadata": {}, + "source": [ + "which leads to" + ] + }, + { + "cell_type": "markdown", + "id": "541f9c56", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "17608bef", + "metadata": {}, + "source": [ + "## Expression for the gradients\n", + "\n", + "This leads to the following equation" + ] + }, + { + "cell_type": "markdown", + "id": "820bcc62", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=\\nabla_{\\boldsymbol{\\Theta}}\\left(\\sum_{x_i\\in \\boldsymbol{X}}\\log{f(x_i;\\boldsymbol{\\Theta})}\\right)-\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "71fc4a0f", + "metadata": {}, + "source": [ + "The first term is called the positive phase and we assume that we have a model for the function $f$ from which we can sample values. Below we will develop an explicit model for this.\n", + "The second term is called the negative phase and is the one which leads to more difficulties." + ] + }, + { + "cell_type": "markdown", + "id": "1c587edf", + "metadata": {}, + "source": [ + "## The derivative of the partition function\n", + "\n", + "The partition function, defined above as" + ] + }, + { + "cell_type": "markdown", + "id": "706a3b88", + "metadata": {}, + "source": [ + "$$\n", + "Z(\\boldsymbol{\\Theta})=\\sum_{x_i\\in \\boldsymbol{X}}\\sum_{h_j\\in \\boldsymbol{H}} f(x_i,h_j;\\boldsymbol{\\Theta}),\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "10d69f88", + "metadata": {}, + "source": [ + "is in general the most problematic term. In principle both $x$ and $h$ can span large degrees of freedom, if not even infinitely many ones, and computing the partition function itself is often not desirable or even feasible. The above derivative of the partition function can however be written in terms of an expectation value which is in turn evaluated using Monte Carlo sampling and the theory of Markov chains, popularly shortened to MCMC (or just MC$^2$)." + ] + }, + { + "cell_type": "markdown", + "id": "c984a8c9", + "metadata": {}, + "source": [ + "## Explicit expression for the derivative\n", + "We can rewrite" + ] + }, + { + "cell_type": "markdown", + "id": "1f3507da", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{\\nabla_{\\boldsymbol{\\Theta}}Z(\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31006b23", + "metadata": {}, + "source": [ + "which reads in more detail" + ] + }, + { + "cell_type": "markdown", + "id": "7f9c8c5c", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{\\nabla_{\\boldsymbol{\\Theta}} \\sum_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}) }{Z(\\boldsymbol{\\Theta})}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b242d816", + "metadata": {}, + "source": [ + "We can rewrite the function $f$ (we have assumed that is larger or\n", + "equal than zero) as $f=\\exp{\\log{f}}$. We can then rewrite the last\n", + "equation as" + ] + }, + { + "cell_type": "markdown", + "id": "97c7e276", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{ \\sum_{x_i\\in \\boldsymbol{X}} \\nabla_{\\boldsymbol{\\Theta}}\\exp{\\log{f(x_i;\\boldsymbol{\\Theta})}} }{Z(\\boldsymbol{\\Theta})}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "aa72c2c9", + "metadata": {}, + "source": [ + "## Final expression\n", + "\n", + "Taking the derivative gives us" + ] + }, + { + "cell_type": "markdown", + "id": "7efbb211", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\frac{ \\sum_{x_i\\in \\boldsymbol{X}}f(x_i;\\boldsymbol{\\Theta}) \\nabla_{\\boldsymbol{\\Theta}}\\log{f(x_i;\\boldsymbol{\\Theta})} }{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "25abbafc", + "metadata": {}, + "source": [ + "which is the expectation value of $\\log{f}$" + ] + }, + { + "cell_type": "markdown", + "id": "0d8169cf", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\sum_{x_i\\sim p}p(x_i;\\boldsymbol{\\Theta}) \\nabla_{\\boldsymbol{\\Theta}}\\log{f(x_i;\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "109e23a9", + "metadata": {}, + "source": [ + "that is" + ] + }, + { + "cell_type": "markdown", + "id": "096e8335", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{Z(\\boldsymbol{\\Theta})}=\\mathbb{E}(\\log{f(x_i;\\boldsymbol{\\Theta})}).\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba431dc6", + "metadata": {}, + "source": [ + "This quantity is evaluated using Monte Carlo sampling, with Gibbs\n", + "sampling as the standard sampling rule." + ] + }, + { + "cell_type": "markdown", + "id": "47c475cc", + "metadata": {}, + "source": [ + "## Final expression for the gradients\n", + "\n", + "This leads to the following equation" + ] + }, + { + "cell_type": "markdown", + "id": "259b2990", + "metadata": {}, + "source": [ + "$$\n", + "\\nabla_{\\boldsymbol{\\Theta}}\\log{p(\\boldsymbol{X};\\boldsymbol{\\Theta})}=\\nabla_{\\boldsymbol{\\Theta}}\\left(\\sum_{x_i\\in \\boldsymbol{X}}\\log{f(x_i;\\boldsymbol{\\Theta})}\\right)-\\mathbb{E}_{x\\sim p}(\\log{f(x_i;\\boldsymbol{\\Theta})})=0.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "22171ca6", + "metadata": {}, + "source": [ + "## Introducing the energy model\n", + "\n", + "As we will see below, a typical Boltzmann machines employs a probability distribution" + ] + }, + { + "cell_type": "markdown", + "id": "82ca77ee", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}) = \\frac{f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})}{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "40b8154c", + "metadata": {}, + "source": [ + "where $f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})$ is given by a so-called energy model. If we assume that the random variables $x_i$ and $h_j$ take binary values only, for example $x_i,h_j=\\{0,1\\}$, we have a so-called binary-binary model where" + ] + }, + { + "cell_type": "markdown", + "id": "a1e935a0", + "metadata": {}, + "source": [ + "$$\n", + "f(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta})=-E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta}) = \\sum_{x_i\\in \\boldsymbol{X}} x_i a_i+\\sum_{h_j\\in \\boldsymbol{H}} b_j h_j + \\sum_{x_i\\in \\boldsymbol{X},h_j\\in\\boldsymbol{H}} x_i w_{ij} h_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "46485779", + "metadata": {}, + "source": [ + "where the set of parameters are given by the biases and weights $\\boldsymbol{\\Theta}=\\{\\boldsymbol{a},\\boldsymbol{b},\\boldsymbol{W}\\}$.\n", + "**Note the vector notation** instead of $x_i$ and $h_j$ for $f$. The vectors $\\boldsymbol{x}$ and $\\boldsymbol{h}$ represent a specific instance of stochastic variables $x_i$ and $h_j$. These arrangements of $\\boldsymbol{x}$ and $\\boldsymbol{h}$ lead to a specific energy configuration." + ] + }, + { + "cell_type": "markdown", + "id": "7502ef1e", + "metadata": {}, + "source": [ + "## More compact notation\n", + "\n", + "With the above definition we can write the probability as" + ] + }, + { + "cell_type": "markdown", + "id": "72dd1e56", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{x},\\boldsymbol{h};\\boldsymbol{\\Theta}) = \\frac{\\exp{(\\boldsymbol{a}^T\\boldsymbol{x}+\\boldsymbol{b}^T\\boldsymbol{h}+\\boldsymbol{x}^T\\boldsymbol{W}\\boldsymbol{h})}}{Z(\\boldsymbol{\\Theta})},\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "080b0628", + "metadata": {}, + "source": [ + "where the biases $\\boldsymbol{a}$ and $\\boldsymbol{h}$ and the weights defined by the matrix $\\boldsymbol{W}$ are the parameters we need to optimize." + ] + }, + { + "cell_type": "markdown", + "id": "1d32cc85", + "metadata": {}, + "source": [ + "## Examples of gradient expressions\n", + "\n", + "Since the binary-binary energy model is linear in the parameters $a_i$, $b_j$ and\n", + "$w_{ij}$, it is easy to see that the derivatives with respect to the\n", + "various optimization parameters yield expressions used in the\n", + "evaluation of gradients like" + ] + }, + { + "cell_type": "markdown", + "id": "e31fa04f", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial w_{ij}}=-x_ih_j,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "908b8f90", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "b077d73b", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial a_i}=-x_i,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "31b867bd", + "metadata": {}, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "26c9578e", + "metadata": {}, + "source": [ + "$$\n", + "\\frac{\\partial E(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})}{\\partial b_j}=-h_j.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1e412a0f", + "metadata": {}, + "source": [ + "## Network Elements, the energy function\n", + "\n", + "The function $E(\\boldsymbol{x},\\boldsymbol{h},\\boldsymbol{\\Theta})$ gives the **energy** of a\n", + "configuration (pair of vectors) $(\\boldsymbol{x}, \\boldsymbol{h})$. The lower\n", + "the energy of a configuration, the higher the probability of it. This\n", + "function also depends on the parameters $\\boldsymbol{a}$, $\\boldsymbol{b}$ and\n", + "$W$. Thus, when we adjust them during the learning procedure, we are\n", + "adjusting the energy function to best fit our problem." + ] + }, + { + "cell_type": "markdown", + "id": "2c1a4259", + "metadata": {}, + "source": [ + "## Defining different types of RBMs\n", + "\n", + "There are different variants of RBMs, and the differences lie in the types of visible and hidden units we choose as well as in the implementation of the energy function $E(\\boldsymbol{x},\\boldsymbol{h},\\boldsymbol{\\Theta})$. The connection between the nodes in the two layers is given by the weights $w_{ij}$. \n", + "\n", + "**Binary-Binary RBM:**\n", + "\n", + "RBMs were first developed using binary units in both the visible and hidden layer. The corresponding energy function is defined as follows:" + ] + }, + { + "cell_type": "markdown", + "id": "d6647898", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\tE(\\boldsymbol{x}, \\boldsymbol{h},\\boldsymbol{\\Theta}) = - \\sum_i^M x_i a_i- \\sum_j^N b_j h_j - \\sum_{i,j}^{M,N} x_i w_{ij} h_j,\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ea36db9d", + "metadata": {}, + "source": [ + "where the binary values taken on by the nodes are most commonly 0 and 1." + ] + }, + { + "cell_type": "markdown", + "id": "8dbd3642", + "metadata": {}, + "source": [ + "## Gaussian-binary RBM\n", + "\n", + "Another varient is the RBM where the visible units are Gaussian while the hidden units remain binary:" + ] + }, + { + "cell_type": "markdown", + "id": "bcef15f3", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\tE(\\boldsymbol{x}, \\boldsymbol{h},\\boldsymbol{\\Theta}) = \\sum_i^M \\frac{(x_i - a_i)^2}{2\\sigma_i^2} - \\sum_j^N b_j h_j - \\sum_{i,j}^{M,N} \\frac{x_i w_{ij} h_j}{\\sigma_i^2}. \n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c7eb6325", + "metadata": {}, + "source": [ + "This type of RBMs are useful when we model continuous data (i.e., we wish $\\boldsymbol{x}$ to be continuous). The paramater $\\sigma_i^2$ is meant to represent a variance and is foten just set to one." + ] + }, + { + "cell_type": "markdown", + "id": "4087d974", + "metadata": {}, + "source": [ + "## Code for RBMs using PyTorch" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "4c915368", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import torch\n", + "import torch.utils.data\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "from torch.autograd import Variable\n", + "from torchvision import datasets, transforms\n", + "from torchvision.utils import make_grid , save_image\n", + "import matplotlib.pyplot as plt\n", + "\n", + "\n", + "batch_size = 64\n", + "train_loader = torch.utils.data.DataLoader(\n", + "datasets.MNIST('./data',\n", + " train=True,\n", + " download = True,\n", + " transform = transforms.Compose(\n", + " [transforms.ToTensor()])\n", + " ),\n", + " batch_size=batch_size\n", + ")\n", + "\n", + "test_loader = torch.utils.data.DataLoader(\n", + "datasets.MNIST('./data',\n", + " train=False,\n", + " transform=transforms.Compose(\n", + " [transforms.ToTensor()])\n", + " ),\n", + " batch_size=batch_size)\n", + "\n", + "\n", + "class RBM(nn.Module):\n", + " def __init__(self,\n", + " n_vis=784,\n", + " n_hin=500,\n", + " k=5):\n", + " super(RBM, self).__init__()\n", + " self.W = nn.Parameter(torch.randn(n_hin,n_vis)*1e-2)\n", + " self.v_bias = nn.Parameter(torch.zeros(n_vis))\n", + " self.h_bias = nn.Parameter(torch.zeros(n_hin))\n", + " self.k = k\n", + " \n", + " def sample_from_p(self,p):\n", + " return F.relu(torch.sign(p - Variable(torch.rand(p.size()))))\n", + " \n", + " def v_to_h(self,v):\n", + " p_h = F.sigmoid(F.linear(v,self.W,self.h_bias))\n", + " sample_h = self.sample_from_p(p_h)\n", + " return p_h,sample_h\n", + " \n", + " def h_to_v(self,h):\n", + " p_v = F.sigmoid(F.linear(h,self.W.t(),self.v_bias))\n", + " sample_v = self.sample_from_p(p_v)\n", + " return p_v,sample_v\n", + " \n", + " def forward(self,v):\n", + " pre_h1,h1 = self.v_to_h(v)\n", + " \n", + " h_ = h1\n", + " for _ in range(self.k):\n", + " pre_v_,v_ = self.h_to_v(h_)\n", + " pre_h_,h_ = self.v_to_h(v_)\n", + " \n", + " return v,v_\n", + " \n", + " def free_energy(self,v):\n", + " vbias_term = v.mv(self.v_bias)\n", + " wx_b = F.linear(v,self.W,self.h_bias)\n", + " hidden_term = wx_b.exp().add(1).log().sum(1)\n", + " return (-hidden_term - vbias_term).mean()\n", + "\n", + "\n", + "\n", + "\n", + "rbm = RBM(k=1)\n", + "train_op = optim.SGD(rbm.parameters(),0.1)\n", + "\n", + "for epoch in range(10):\n", + " loss_ = []\n", + " for _, (data,target) in enumerate(train_loader):\n", + " data = Variable(data.view(-1,784))\n", + " sample_data = data.bernoulli()\n", + " \n", + " v,v1 = rbm(sample_data)\n", + " loss = rbm.free_energy(v) - rbm.free_energy(v1)\n", + " loss_.append(loss.data)\n", + " train_op.zero_grad()\n", + " loss.backward()\n", + " train_op.step()\n", + "\n", + " print(\"Training loss for {} epoch: {}\".format(epoch, np.mean(loss_)))\n", + "\n", + "\n", + "def show_adn_save(file_name,img):\n", + " npimg = np.transpose(img.numpy(),(1,2,0))\n", + " f = \"./%s.png\" % file_name\n", + " plt.imshow(npimg)\n", + " plt.imsave(f,npimg)\n", + "\n", + "show_adn_save(\"real\",make_grid(v.view(32,1,28,28).data))\n", + "show_adn_save(\"generate\",make_grid(v1.view(32,1,28,28).data))" + ] + }, + { + "cell_type": "markdown", + "id": "79da884b", + "metadata": {}, + "source": [ + "## Energy-based models and Langevin sampling\n", + "\n", + "See discussions in Foster, chapter 7 on energy-based models at \n", + "\n", + "That notebook is based on a recent article by Du and Mordatch, **Implicit generation and modeling with energy-based models**, see " + ] + }, + { + "cell_type": "markdown", + "id": "7e0850c9", + "metadata": {}, + "source": [ + "## Tensor-flow examples\n", + "\n", + "1. To create Boltzmann machine using Keras, see Babcock and Bali chapter 4, see \n", + "\n", + "2. See also Foster, chapter 7 on energy-based models at " + ] + }, + { + "cell_type": "markdown", + "id": "cfccb8dd", + "metadata": {}, + "source": [ + "## Kullback-Leibler divergence\n", + "\n", + "Before we continue, we need to remind ourselves about the\n", + "Kullback-Leibler divergence introduced earlier.\n", + "These metrics are useful for quantifying the similarity between two probability distributions.\n", + "\n", + "The Kullback–Leibler (KL) divergence, labeled $D_{KL}$, measures how one probability distribution $p$ diverges from a second expected probability distribution $q$,\n", + "that is" + ] + }, + { + "cell_type": "markdown", + "id": "8c21dd0f", + "metadata": {}, + "source": [ + "$$\n", + "D_{KL}(p \\| q) = \\int_x p(x) \\log \\frac{p(x)}{q(x)} dx.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "edc57a74", + "metadata": {}, + "source": [ + "The KL-divegernce $D_{KL}$ achieves the minimum zero when $p(x) == q(x)$ everywhere." + ] + }, + { + "cell_type": "markdown", + "id": "2719f23e", + "metadata": {}, + "source": [ + "## VAEs\n", + "\n", + "Mathematically, we can imagine the latent variables and the data we\n", + "observe as modeled by a joint distribution $p(\\boldsymbol{x}, \\boldsymbol{h};\\boldsymbol{\\Theta})$. Recall one\n", + "approach of generative modeling, termed likelihood-based, is to\n", + "learn a model to maximize the likelihood $p(\\boldsymbol{x};\\boldsymbol{\\Theta})$ of all observed\n", + "$\\boldsymbol{x}$. There are two ways we can manipulate this joint distribution\n", + "to recover the likelihood of purely our observed data $p(\\boldsymbol{x};\\boldsymbol{\\Theta})$; we can\n", + "explicitly marginalize\n", + "out the latent variable $\\boldsymbol{h}$" + ] + }, + { + "cell_type": "markdown", + "id": "3b430bdd", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{x}) = \\int p(\\boldsymbol{x}, \\boldsymbol{h})d\\boldsymbol{h}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "c26279f7", + "metadata": {}, + "source": [ + "or, we could also appeal to the chain rule of probability" + ] + }, + { + "cell_type": "markdown", + "id": "f1817c6e", + "metadata": {}, + "source": [ + "$$\n", + "p(\\boldsymbol{x}) = \\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{p(\\boldsymbol{h}|\\boldsymbol{x})}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ed4f07c8", + "metadata": {}, + "source": [ + "We suppress here the dependence\ton the optimization parameters $\\boldsymbol{\\Theta}$." + ] + }, + { + "cell_type": "markdown", + "id": "ca96383c", + "metadata": {}, + "source": [ + "## Introducing the encoder function\n", + "\n", + "Here, $q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})$ is a flexible approximate\n", + "variational distribution with parameters $\\boldsymbol{\\phi}$ that we seek to\n", + "optimize. Intuitively, it can be thought of as a parameterizable\n", + "model that is learned to estimate the true distribution over latent\n", + "variables for given observations $\\boldsymbol{x}$; in other words, it seeks to\n", + "approximate true posterior $p(\\boldsymbol{h}|\\boldsymbol{x})$. As we saw last week when we\n", + "explored Variational Autoencoders, as we increase the lower bound\n", + "by tuning the parameters $\\boldsymbol{\\phi}$ to maximize the ELBO, we gain\n", + "access to components that can be used to model the true data\n", + "distribution and sample from it, thus learning a generative model." + ] + }, + { + "cell_type": "markdown", + "id": "3bca4a6c", + "metadata": {}, + "source": [ + "## ELBO\n", + "\n", + "To better understand the relationship between the evidence and the ELBO, let us perform another derivation, this time using" + ] + }, + { + "cell_type": "markdown", + "id": "430ada3d", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\\log p(\\boldsymbol{x}) & = \\log p(\\boldsymbol{x}) \\int q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})d\\boldsymbol{h} && \\text{(Multiply by $1 = \\int q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})d\\boldsymbol{h}$)}\\\\\n", + " & = \\int q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})(\\log p(\\boldsymbol{x}))d\\boldsymbol{h} && \\text{(Bring evidence into integral)}\\\\\n", + " & = \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log p(\\boldsymbol{x})\\right] && \\text{(Definition of Expectation)}\\\\\n", + " & = \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{p(\\boldsymbol{h}|\\boldsymbol{x})}\\right]&& \\\\\n", + " & = \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}{p(\\boldsymbol{h}|\\boldsymbol{x})q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right]&& \\text{(Multiply by $1 = \\frac{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}$)}\\\\\n", + " & = \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right] + \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}{p(\\boldsymbol{h}|\\boldsymbol{x})}\\right] && \\text{(Split the Expectation)}\\\\\n", + " & = \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right] +\n", + "\t D_{KL}(q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})\\vert\\vert p(\\boldsymbol{h}|\\boldsymbol{x})) && \\text{(Definition of KL Divergence)}\\\\\n", + " & \\geq \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right] && \\text{(KL Divergence always $\\geq 0$)}\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "eac31eef", + "metadata": {}, + "source": [ + "## The VAE\n", + "\n", + "In the default formulation of the VAE by Kingma and Welling (2015), we directly maximize the ELBO. This\n", + "approach is \\textit{variational}, because we optimize for the best\n", + "$q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})$ amongst a family of potential posterior\n", + "distributions parameterized by $\\boldsymbol{\\phi}$. It is called an\n", + "\\textit{autoencoder} because it is reminiscent of a traditional\n", + "autoencoder model, where input data is trained to predict itself after\n", + "undergoing an intermediate bottlenecking representation step." + ] + }, + { + "cell_type": "markdown", + "id": "c106556c", + "metadata": {}, + "source": [ + "## Dissecting the equations\n", + "To make\n", + "this connection explicit, let us dissect the ELBO term further:" + ] + }, + { + "cell_type": "markdown", + "id": "70a8aaa8", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "{\\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{x}, \\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right]}\n", + "&= {\\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h})p(\\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right]} && {\\text{(Chain Rule of Probability)}}\\\\\n", + "&= {\\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h})\\right] + \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log\\frac{p(\\boldsymbol{h})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\right]} && {\\text{(Split the Expectation)}}\\\\\n", + "&= \\underbrace{{\\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h})\\right]}}_\\text{reconstruction term} - \\underbrace{{D_{KL}(q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\vert\\vert{p(\\boldsymbol{h}))}}_\\text{prior matching term} && {\\text{(Definition of KL Divergence)}}\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "df02f1d7", + "metadata": {}, + "source": [ + "## Bottlenecking distribution\n", + "\n", + "In this case, we learn an intermediate bottlenecking distribution\n", + "$q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})$ that can be treated as\n", + "an \\textit{encoder}; it transforms inputs into a distribution over\n", + "possible latents. Simultaneously, we learn a deterministic function\n", + "$p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h})$ to convert a given latent vector\n", + "$\\boldsymbol{h}$ into an observation $\\boldsymbol{x}$, which can be interpreted as\n", + "a \\textit{decoder}." + ] + }, + { + "cell_type": "markdown", + "id": "99c787d5", + "metadata": {}, + "source": [ + "## Decoder and encoder\n", + "The two terms in the last equation each have intuitive descriptions: the first\n", + "term measures the reconstruction likelihood of the decoder from our\n", + "variational distribution; this ensures that the learned distribution\n", + "is modeling effective latents that the original data can be\n", + "regenerated from. The second term measures how similar the learned\n", + "variational distribution is to a prior belief held over latent\n", + "variables. Minimizing this term encourages the encoder to actually\n", + "learn a distribution rather than collapse into a Dirac delta function.\n", + "Maximizing the ELBO is thus equivalent to maximizing its first term\n", + "and minimizing its second term." + ] + }, + { + "cell_type": "markdown", + "id": "dddf965b", + "metadata": {}, + "source": [ + "## Defining feature of VAEs\n", + "\n", + "A defining feature of the VAE is how the ELBO is optimized jointly over parameters $\\boldsymbol{\\phi}$ and $\\boldsymbol{\\theta}$. The encoder of the VAE is commonly chosen to model a multivariate Gaussian with diagonal covariance, and the prior is often selected to be a standard multivariate Gaussian:" + ] + }, + { + "cell_type": "markdown", + "id": "43574fea", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x}) &= N(\\boldsymbol{h}; \\boldsymbol{\\mu}_{\\boldsymbol{\\phi}}(\\boldsymbol{x}), \\boldsymbol{\\sigma}_{\\boldsymbol{\\phi}}^2(\\boldsymbol{x})\\textbf{I})\\\\\n", + " p(\\boldsymbol{h}) &= N(\\boldsymbol{h}; \\boldsymbol{0}, \\textbf{I})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e5b4e346", + "metadata": {}, + "source": [ + "## Analytical evaluation\n", + "\n", + "Then, the KL divergence term of the ELBO can be computed analytically, and the reconstruction term can be approximated using a Monte Carlo estimate. Our objective can then be rewritten as:" + ] + }, + { + "cell_type": "markdown", + "id": "4e864163", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " \\mathrm{argmax}_{\\boldsymbol{\\phi}, \\boldsymbol{\\theta}} \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h})\\right] - D_{KL}(q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})\\vert\\vert p(\\boldsymbol{h})) \\approx \\mathrm{argmax}_{\\boldsymbol{\\phi}, \\boldsymbol{\\theta}} \\sum_{l=1}^{L}\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h}^{(l)}) - D_{KL}(q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})\\vert\\vert p(\\boldsymbol{h}))\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "30aed83c", + "metadata": {}, + "source": [ + "where latents $\\{\\boldsymbol{h}^{(l)}\\}_{l=1}^L$ are sampled from $q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})$, for every observation $\\boldsymbol{x}$ in the dataset." + ] + }, + { + "cell_type": "markdown", + "id": "8fd080db", + "metadata": {}, + "source": [ + "## Reparameterization trick\n", + "\n", + "However, a problem arises in this default setup: each $\\boldsymbol{h}^{(l)}$\n", + "that our loss is computed on is generated by a stochastic sampling\n", + "procedure, which is generally non-differentiable. Fortunately, this\n", + "can be addressed via the \\textit{reparameterization trick} when\n", + "$q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}|\\boldsymbol{x})$ is designed to model certain\n", + "distributions, including the multivariate Gaussian." + ] + }, + { + "cell_type": "markdown", + "id": "45fb9f75", + "metadata": {}, + "source": [ + "## Actual implementation\n", + "\n", + "The reparameterization trick rewrites a random variable as a\n", + "deterministic function of a noise variable; this allows for the\n", + "optimization of the non-stochastic terms through gradient descent.\n", + "For example, samples from a normal distribution\n", + "$x \\sim N(x;\\mu, \\sigma^2)$ with arbitrary mean $\\mu$ and\n", + "variance $\\sigma^2$ can be rewritten as" + ] + }, + { + "cell_type": "markdown", + "id": "fbc69ab9", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " x &= \\mu + \\sigma\\epsilon \\quad \\text{with } \\epsilon \\sim N(\\epsilon; 0, \\boldsymbol{I})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "ba1ca35d", + "metadata": {}, + "source": [ + "## Interpretation\n", + "An arbitrary Gaussian distributions can be interpreted as\n", + "standard Gaussians (of which $\\epsilon$ is a sample) that have their\n", + "mean shifted from zero to the target mean $\\mu$ by addition, and their\n", + "variance stretched by the target variance $\\sigma^2$. Therefore, by\n", + "the reparameterization trick, sampling from an arbitrary Gaussian\n", + "distribution can be performed by sampling from a standard Gaussian,\n", + "scaling the result by the target standard deviation, and shifting it\n", + "by the target mean." + ] + }, + { + "cell_type": "markdown", + "id": "cc3cf402", + "metadata": {}, + "source": [ + "## Deterministic function\n", + "\n", + "In a VAE, each $\\boldsymbol{h}$ is thus computed as a deterministic function of input $\\boldsymbol{x}$ and auxiliary noise variable $\\boldsymbol{\\epsilon}$:" + ] + }, + { + "cell_type": "markdown", + "id": "2f1a410d", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " \\boldsymbol{h} &= \\boldsymbol{\\mu}_{\\boldsymbol{\\phi}}(\\boldsymbol{x}) + \\boldsymbol{\\sigma}_{\\boldsymbol{\\phi}}(\\boldsymbol{x})\\odot\\boldsymbol{\\epsilon} \\quad \\text{with } \\boldsymbol{\\epsilon} \\sim N(\\boldsymbol{\\epsilon};\\boldsymbol{0}, \\textbf{I})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "80472a56", + "metadata": {}, + "source": [ + "where $\\odot$ represents an element-wise product. Under this\n", + "reparameterized version of $\\boldsymbol{h}$, gradients can then be computed\n", + "with respect to $\\boldsymbol{\\phi}$ as desired, to optimize\n", + "$\\boldsymbol{\\mu}_{\\boldsymbol{\\phi}}$ and $\\boldsymbol{\\sigma}_{\\boldsymbol{\\phi}}$. The VAE\n", + "therefore utilizes the reparameterization trick and Monte Carlo\n", + "estimates to optimize the ELBO jointly over $\\boldsymbol{\\phi}$ and\n", + "$\\boldsymbol{\\theta}$." + ] + }, + { + "cell_type": "markdown", + "id": "83418e47", + "metadata": {}, + "source": [ + "## After training\n", + "\n", + "After training a VAE, generating new data can be performed by sampling\n", + "directly from the latent space $p(\\boldsymbol{h})$ and then running it through\n", + "the decoder. Variational Autoencoders are particularly interesting\n", + "when the dimensionality of $\\boldsymbol{h}$ is less than that of input\n", + "$\\boldsymbol{x}$, as we might then be learning compact, useful\n", + "representations. Furthermore, when a semantically meaningful latent\n", + "space is learned, latent vectors can be edited before being passed to\n", + "the decoder to more precisely control the data generated." + ] + }, + { + "cell_type": "markdown", + "id": "cc4a0a79", + "metadata": {}, + "source": [ + "## Diffusion models, basics\n", + "\n", + "Diffusion models are inspired by non-equilibrium thermodynamics. They\n", + "define a Markov chain of diffusion steps to slowly add random noise to\n", + "data and then learn to reverse the diffusion process to construct\n", + "desired data samples from the noise. Unlike VAE or flow models,\n", + "diffusion models are learned with a fixed procedure and the latent\n", + "variable has high dimensionality (same as the original data)." + ] + }, + { + "cell_type": "markdown", + "id": "b4b7191e", + "metadata": {}, + "source": [ + "## Problems with probabilistic models\n", + "\n", + "Historically, probabilistic models suffer from a tradeoff between two\n", + "conflicting objectives: \\textit{tractability} and\n", + "\\textit{flexibility}. Models that are \\textit{tractable} can be\n", + "analytically evaluated and easily fit to data (e.g. a Gaussian or\n", + "Laplace). However, these models are unable to aptly describe structure\n", + "in rich datasets. On the other hand, models that are \\textit{flexible}\n", + "can be molded to fit structure in arbitrary data. For example, we can\n", + "define models in terms of any (non-negative) function $\\phi(\\boldsymbol{x})$\n", + "yielding the flexible distribution $p\\left(\\boldsymbol{x}\\right) =\n", + "\\frac{\\phi\\left(\\boldsymbol{x} \\right)}{Z}$, where $Z$ is a normalization\n", + "constant. However, computing this normalization constant is generally\n", + "intractable. Evaluating, training, or drawing samples from such\n", + "flexible models typically requires a very expensive Monte Carlo\n", + "process." + ] + }, + { + "cell_type": "markdown", + "id": "f6465d64", + "metadata": {}, + "source": [ + "## Diffusion models\n", + "Diffusion models have several interesting features\n", + "* extreme flexibility in model structure,\n", + "\n", + "* exact sampling,\n", + "\n", + "* easy multiplication with other distributions, e.g. in order to compute a posterior, and\n", + "\n", + "* the model log likelihood, and the probability of individual states, to be cheaply evaluated." + ] + }, + { + "cell_type": "markdown", + "id": "1db127cb", + "metadata": {}, + "source": [ + "## Original idea\n", + "\n", + "In the original formulation, one uses a Markov chain to gradually\n", + "convert one distribution into another, an idea used in non-equilibrium\n", + "statistical physics and sequential Monte Carlo. Diffusion models build\n", + "a generative Markov chain which converts a simple known distribution\n", + "(e.g. a Gaussian) into a target (data) distribution using a diffusion\n", + "process. Rather than use this Markov chain to approximately evaluate a\n", + "model which has been otherwise defined, one can explicitly define the\n", + "probabilistic model as the endpoint of the Markov chain. Since each\n", + "step in the diffusion chain has an analytically evaluable probability,\n", + "the full chain can also be analytically evaluated." + ] + }, + { + "cell_type": "markdown", + "id": "0bc54ea4", + "metadata": {}, + "source": [ + "## Diffusion learning\n", + "\n", + "Learning in this framework involves estimating small perturbations to\n", + "a diffusion process. Estimating small, analytically tractable,\n", + "perturbations is more tractable than explicitly describing the full\n", + "distribution with a single, non-analytically-normalizable, potential\n", + "function. Furthermore, since a diffusion process exists for any\n", + "smooth target distribution, this method can capture data distributions\n", + "of arbitrary form." + ] + }, + { + "cell_type": "markdown", + "id": "2be87c86", + "metadata": {}, + "source": [ + "## Mathematics of diffusion models\n", + "\n", + "Let us go back our discussions of the variational autoencoders from\n", + "last week, see\n", + ". As\n", + "a first attempt at understanding diffusion models, we can think of\n", + "these as stacked VAEs, or better, recursive VAEs.\n", + "\n", + "Let us try to see why. As an intermediate step, we consider so-called\n", + "hierarchical VAEs, which can be seen as a generalization of VAEs that\n", + "include multiple hierarchies of latent spaces.\n", + "\n", + "**Note**: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at ." + ] + }, + { + "cell_type": "markdown", + "id": "1549b4b2", + "metadata": {}, + "source": [ + "## Chains of VAEs\n", + "\n", + "Markovian\n", + "VAEs represent a generative process where we use Markov chain to build a hierarchy of VAEs.\n", + "\n", + "Each transition down the hierarchy is Markovian, where we decode each\n", + "latent set of variables $\\boldsymbol{h}_t$ in terms of the previous latent variable $\\boldsymbol{h}_{t-1}$.\n", + "Intuitively, and visually, this can be seen as simply stacking VAEs on\n", + "top of each other (see figure next slide).\n", + "\n", + "One can think of such a model as a recursive VAE." + ] + }, + { + "cell_type": "markdown", + "id": "fcc12688", + "metadata": {}, + "source": [ + "## Mathematical representation\n", + "\n", + "Mathematically, we represent the joint distribution and the posterior\n", + "of a Markovian VAE as" + ] + }, + { + "cell_type": "markdown", + "id": "585f40c3", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " p(\\boldsymbol{x}, \\boldsymbol{h}_{1:T}) &= p(\\boldsymbol{h}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h}_1)\\prod_{t=2}^{T}p_{\\boldsymbol{\\theta}}(\\boldsymbol{h}_{t-1}|\\boldsymbol{h}_{t})\\\\\n", + " q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{1:T}|\\boldsymbol{x}) &= q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_1|\\boldsymbol{x})\\prod_{t=2}^{T}q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{t}|\\boldsymbol{h}_{t-1})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "f3814fcc", + "metadata": {}, + "source": [ + "## Diffusion models for hierarchical VAE, from \n", + "\n", + "A Markovian hierarchical Variational Autoencoder with $T$ hierarchical\n", + "latents. The generative process is modeled as a Markov chain, where\n", + "each latent $\\boldsymbol{h}_t$ is generated only from the previous latent\n", + "$\\boldsymbol{h}_{t+1}$. Here $\\boldsymbol{z}$ is our latent variable $\\boldsymbol{h}$.\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "5c7e977b", + "metadata": {}, + "source": [ + "## Equation for the Markovian hierarchical VAE\n", + "\n", + "We obtain then" + ] + }, + { + "cell_type": "markdown", + "id": "d17646c2", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{1:T}|\\boldsymbol{x})}\\left[\\log \\frac{p(\\boldsymbol{x}, \\boldsymbol{h}_{1:T})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{1:T}|\\boldsymbol{x})}\\right]\n", + "&= \\mathbb{E}_{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{1:T}|\\boldsymbol{x})}\\left[\\log \\frac{p(\\boldsymbol{h}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}|\\boldsymbol{h}_1)\\prod_{t=2}^{T}p_{\\boldsymbol{\\theta}}(\\boldsymbol{h}_{t-1}|\\boldsymbol{h}_{t})}{q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_1|\\boldsymbol{x})\\prod_{t=2}^{T}q_{\\boldsymbol{\\phi}}(\\boldsymbol{h}_{t}|\\boldsymbol{h}_{t-1})}\\right]\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "61a5ac10", + "metadata": {}, + "source": [ + "We will modify this equation when we discuss what are normally called Variational Diffusion Models." + ] + }, + { + "cell_type": "markdown", + "id": "6334546c", + "metadata": {}, + "source": [ + "## Variational Diffusion Models\n", + "\n", + "The easiest way to think of a Variational Diffusion Model (VDM) is as a Markovian Hierarchical Variational Autoencoder with three key restrictions:\n", + "\n", + "1. The latent dimension is exactly equal to the data dimension\n", + "\n", + "2. The structure of the latent encoder at each timestep is not learned; it is pre-defined as a linear Gaussian model. In other words, it is a Gaussian distribution centered around the output of the previous timestep\n", + "\n", + "3. The Gaussian parameters of the latent encoders vary over time in such a way that the distribution of the latent at final timestep $T$ is a standard Gaussian\n", + "\n", + "The VDM posterior is" + ] + }, + { + "cell_type": "markdown", + "id": "722ec75b", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + " q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0) = \\prod_{t = 1}^{T}q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "85378e6c", + "metadata": {}, + "source": [ + "## Second assumption\n", + "\n", + "The distribution of each latent variable in the encoder is a Gaussian centered around its previous hierarchical latent.\n", + "Here then, the structure of the encoder at each timestep $t$ is not learned; it\n", + "is fixed as a linear Gaussian model, where the mean and standard\n", + "deviation can be set beforehand as hyperparameters, or learned as\n", + "parameters." + ] + }, + { + "cell_type": "markdown", + "id": "f324b118", + "metadata": {}, + "source": [ + "## Parameterizing Gaussian encoder\n", + "\n", + "We parameterize the Gaussian encoder with mean $\\boldsymbol{\\mu}_t(\\boldsymbol{x}_t) =\n", + "\\sqrt{\\alpha_t} \\boldsymbol{x}_{t-1}$, and variance $\\boldsymbol{\\Sigma}_t(\\boldsymbol{x}_t) =\n", + "(1 - \\alpha_t) \\textbf{I}$, where the form of the coefficients are\n", + "chosen such that the variance of the latent variables stay at a\n", + "similar scale; in other words, the encoding process is\n", + "variance-preserving.\n", + "\n", + "Note that alternate Gaussian parameterizations\n", + "are allowed, and lead to similar derivations. The main takeaway is\n", + "that $\\alpha_t$ is a (potentially learnable) coefficient that can vary\n", + "with the hierarchical depth $t$, for flexibility." + ] + }, + { + "cell_type": "markdown", + "id": "51754b7f", + "metadata": {}, + "source": [ + "## Encoder transitions\n", + "\n", + "Mathematically, the encoder transitions are defined as" + ] + }, + { + "cell_type": "markdown", + "id": "90323d28", + "metadata": {}, + "source": [ + "\n", + "
    \n", + "\n", + "$$\n", + "\\begin{align*}\n", + " q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1}) = \\mathcal{N}(\\boldsymbol{x}_{t} ; \\sqrt{\\alpha_t} \\boldsymbol{x}_{t-1}, (1 - \\alpha_t) \\textbf{I}) \\label{eq:27} \\tag{4}\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1eb91a8d", + "metadata": {}, + "source": [ + "## Third assumption\n", + "\n", + "From the third assumption, we know that $\\alpha_t$ evolves over time\n", + "according to a fixed or learnable schedule structured such that the\n", + "distribution of the final latent $p(\\boldsymbol{x}_T)$ is a standard Gaussian.\n", + "We can then update the joint distribution of a Markovian VAE to write\n", + "the joint distribution for a VDM as" + ] + }, + { + "cell_type": "markdown", + "id": "d3de5091", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "p(\\boldsymbol{x}_{0:T}) &= p(\\boldsymbol{x}_T)\\prod_{t=1}^{T}p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t-1}|\\boldsymbol{x}_t) \\\\\n", + "\\text{where,}&\\nonumber\\\\\n", + "p(\\boldsymbol{x}_T) &= \\mathcal{N}(\\boldsymbol{x}_T; \\boldsymbol{0}, \\textbf{I})\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "32fe1677", + "metadata": {}, + "source": [ + "## Noisification\n", + "\n", + "Collectively, what this set of assumptions describes is a steady\n", + "noisification of an image input over time. We progressively corrupt an\n", + "image by adding Gaussian noise until eventually it becomes completely\n", + "identical to pure Gaussian noise. See figure on next slide." + ] + }, + { + "cell_type": "markdown", + "id": "85af58a8", + "metadata": {}, + "source": [ + "## Diffusion models, from \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "f071a8d9", + "metadata": {}, + "source": [ + "## Gaussian modeling\n", + "\n", + "Note that our encoder distributions $q(\\boldsymbol{x}_t|\\boldsymbol{x}_{t-1})$ are no\n", + "longer parameterized by $\\boldsymbol{\\phi}$, as they are completely modeled as\n", + "Gaussians with defined mean and variance parameters at each timestep.\n", + "Therefore, in a VDM, we are only interested in learning conditionals\n", + "$p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t-1}|\\boldsymbol{x}_{t})$, so that we can simulate\n", + "new data. After optimizing the VDM, the sampling procedure is as\n", + "simple as sampling Gaussian noise from $p(\\boldsymbol{x}_T)$ and iteratively\n", + "running the denoising transitions\n", + "$p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t-1}|\\boldsymbol{x}_{t})$ for $T$ steps to generate a\n", + "novel $\\boldsymbol{x}_0$." + ] + }, + { + "cell_type": "markdown", + "id": "e0768637", + "metadata": {}, + "source": [ + "## Optimizing the variational diffusion model" + ] + }, + { + "cell_type": "markdown", + "id": "fb3d9418", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\\log p(\\boldsymbol{x})\n", + "&= \\log \\int p(\\boldsymbol{x}_{0:T}) d\\boldsymbol{x}_{1:T}\\\\\n", + "&= \\log \\int \\frac{p(\\boldsymbol{x}_{0:T})q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)} d\\boldsymbol{x}_{1:T}\\\\\n", + "&= \\log \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\frac{p(\\boldsymbol{x}_{0:T})}{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\right]\\\\\n", + "&\\geq {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_{0:T})}{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)\\prod_{t=1}^{T}p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t-1}|\\boldsymbol{x}_t)}{\\prod_{t = 1}^{T}q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\prod_{t=2}^{T}p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t-1}|\\boldsymbol{x}_t)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})\\prod_{t = 1}^{T-1}q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\prod_{t=1}^{T-1}p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})\\prod_{t = 1}^{T-1}q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})}\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\prod_{t = 1}^{T-1}\\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "1401be05", + "metadata": {}, + "source": [ + "## Continues" + ] + }, + { + "cell_type": "markdown", + "id": "d7c838e6", + "metadata": {}, + "source": [ + "$$\n", + "\\begin{align*}\n", + "\\log p(\\boldsymbol{x})\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})}\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\prod_{t = 1}^{T-1}\\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})}\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[ \\sum_{t=1}^{T-1} \\log \\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})}\\right] + \\sum_{t=1}^{T-1}\\mathbb{E}_{q(\\boldsymbol{x}_{1:T}|\\boldsymbol{x}_0)}\\left[ \\log \\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "&= {\\mathbb{E}_{q(\\boldsymbol{x}_{1}|\\boldsymbol{x}_0)}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\right] + \\mathbb{E}_{q(\\boldsymbol{x}_{T-1}, \\boldsymbol{x}_T|\\boldsymbol{x}_0)}\\left[\\log \\frac{p(\\boldsymbol{x}_T)}{q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})}\\right] + \\sum_{t=1}^{T-1}\\mathbb{E}_{q(\\boldsymbol{x}_{t-1}, \\boldsymbol{x}_t, \\boldsymbol{x}_{t+1}|\\boldsymbol{x}_0)}\\left[\\log \\frac{p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1})}{q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})}\\right]}\\\\\n", + "\\end{align*}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "802891da", + "metadata": {}, + "source": [ + "## Interpretations\n", + "\n", + "These equations can be interpreted as\n", + "\n", + "* $\\mathbb{E}_{q(\\boldsymbol{x}_{1}|\\boldsymbol{x}_0)}\\left[\\log p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_0|\\boldsymbol{x}_1)\\right]$ can be interpreted as a **reconstruction term**, predicting the log probability of the original data sample given the first-step latent. This term also appears in a vanilla VAE, and can be trained similarly.\n", + "\n", + "* $\\mathbb{E}_{q(\\boldsymbol{x}_{T-1}|\\boldsymbol{x}_0)}\\left[D_{KL}(q(\\boldsymbol{x}_T|\\boldsymbol{x}_{T-1})\\vert\\vert p(\\boldsymbol{x}_T))\\right]$ is a **prior matching term**; it is minimized when the final latent distribution matches the Gaussian prior. This term requires no optimization, as it has no trainable parameters; furthermore, as we have assumed a large enough $T$ such that the final distribution is Gaussian, this term effectively becomes zero." + ] + }, + { + "cell_type": "markdown", + "id": "3f73f260", + "metadata": {}, + "source": [ + "## The last term\n", + "\n", + "* $\\mathbb{E}_{q(\\boldsymbol{x}_{t-1}, \\boldsymbol{x}_{t+1}|\\boldsymbol{x}_0)}\\left[D_{KL}(q(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t-1})\\vert\\vert p_{\\boldsymbol{\\theta}}(\\boldsymbol{x}_{t}|\\boldsymbol{x}_{t+1}))\\right]$ is a \\textit{consistency term}; it endeavors to make the distribution at $\\boldsymbol{x}_t$ consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train $p_{\\theta}(\\boldsymbol{x}_t|\\boldsymbol{x}_{t+1})$ to match the Gaussian distribution $q(\\boldsymbol{x}_t|\\boldsymbol{x}_{t-1})$." + ] + }, + { + "cell_type": "markdown", + "id": "e33bc0aa", + "metadata": {}, + "source": [ + "## Diffusion models, part 2, from \n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "2db1db42", + "metadata": {}, + "source": [ + "## Optimization cost\n", + "\n", + "The cost of optimizing a VDM is primarily dominated by the third term, since we must optimize over all timesteps $t$.\n", + "\n", + "Under this derivation, all three terms are computed as expectations,\n", + "and can therefore be approximated using Monte Carlo estimates.\n", + "However, actually optimizing the ELBO using the terms we just derived\n", + "might be suboptimal; because the consistency term is computed as an\n", + "expectation over two random variables $\\left\\{\\boldsymbol{x}_{t-1},\n", + "\\boldsymbol{x}_{t+1}\\right\\}$ for every timestep, the variance of its Monte\n", + "Carlo estimate could potentially be higher than a term that is\n", + "estimated using only one random variable per timestep. As it is\n", + "computed by summing up $T-1$ consistency terms, the final estimated\n", + "value may have high variance for large $T$ values." + ] + }, + { + "cell_type": "markdown", + "id": "ae15d708", + "metadata": {}, + "source": [ + "## More details\n", + "\n", + "For more details and implementaions, see Calvin Luo at \n", + "\n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/pub/catania/ipynb/catania.ipynb b/doc/pub/catania/ipynb/catania.ipynb index 6b8e25e..a32f77d 100644 --- a/doc/pub/catania/ipynb/catania.ipynb +++ b/doc/pub/catania/ipynb/catania.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "153acda9", + "id": "0cd93842", "metadata": { "editable": true }, @@ -14,7 +14,7 @@ }, { "cell_type": "markdown", - "id": "98222977", + "id": "653c9ba3", "metadata": { "editable": true }, @@ -27,7 +27,7 @@ }, { "cell_type": "markdown", - "id": "ea720441", + "id": "edbeb108", "metadata": { "editable": true }, @@ -46,7 +46,7 @@ }, { "cell_type": "markdown", - "id": "be52cf0a", + "id": "08b3ff3a", "metadata": { "editable": true }, @@ -64,7 +64,7 @@ }, { "cell_type": "markdown", - "id": "8ec51b16", + "id": "acb8af9d", "metadata": { "editable": true }, @@ -80,7 +80,7 @@ }, { "cell_type": "markdown", - "id": "9e9be934", + "id": "df158234", "metadata": { "editable": true }, @@ -96,7 +96,7 @@ }, { "cell_type": "markdown", - "id": "e95c9d02", + "id": "2c394af4", "metadata": { "editable": true }, @@ -118,7 +118,79 @@ }, { "cell_type": "markdown", - "id": "98ad0ecb", + "id": "17f9e522", + "metadata": { + "editable": true + }, + "source": [ + "## Extrapolations and model interpretability\n", + "\n", + "When you hear phrases like **predictions and estimations** and\n", + "**correlations and causations**, what do you think of?\n", + "\n", + "May be you think\n", + "of the difference between classifying new data points and generating\n", + "new data points.\n", + "\n", + "Or perhaps you consider that correlations represent some kind of symmetric statements like\n", + "if $A$ is correlated with $B$, then $B$ is correlated with\n", + "$A$. Causation on the other hand is directional, that is if $A$ causes $B$, $B$ does not\n", + "necessarily cause $A$." + ] + }, + { + "cell_type": "markdown", + "id": "5d68e865", + "metadata": { + "editable": true + }, + "source": [ + "## Generative and discriminative models\n", + "\n", + "1. Balance between tractability and flexibility\n", + "\n", + "2. We want to extract information about correlations, to make predictions, quantify uncertainties and express causality\n", + "\n", + "3. How do we represent reliably our effective degrees of freedom?\n", + "\n", + "A teaser first, see next slides." + ] + }, + { + "cell_type": "markdown", + "id": "d3540c12", + "metadata": { + "editable": true + }, + "source": [ + "## [Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023)](https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062) at density $\\rho=0.04$ fm$^{-3}$\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "08fa3db3", + "metadata": { + "editable": true + }, + "source": [ + "## The electron gas in three dimensions with $N=14$ electrons (Wigner-Seitz radius $r_s=2$ a.u.), [Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,](https://doi.org/10.48550/arXiv.2305.07240)\n", + "\n", + "\n", + "\n", + "\n", + "

    Figure 1:

    \n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "cd28f343", "metadata": { "editable": true }, @@ -141,7 +213,7 @@ }, { "cell_type": "markdown", - "id": "6fbf4339", + "id": "58b00c74", "metadata": { "editable": true }, @@ -157,7 +229,7 @@ }, { "cell_type": "markdown", - "id": "152639af", + "id": "cd7ea407", "metadata": { "editable": true }, @@ -183,7 +255,7 @@ }, { "cell_type": "markdown", - "id": "cbbea082", + "id": "165edfcd", "metadata": { "editable": true }, @@ -199,7 +271,7 @@ }, { "cell_type": "markdown", - "id": "b8696a2c", + "id": "7b5105e2", "metadata": { "editable": true }, @@ -215,7 +287,7 @@ }, { "cell_type": "markdown", - "id": "6ac65ba4", + "id": "d4b748ef", "metadata": { "editable": true }, @@ -235,7 +307,7 @@ }, { "cell_type": "markdown", - "id": "2253c0c3", + "id": "6fe09ccb", "metadata": { "editable": true }, @@ -251,7 +323,7 @@ }, { "cell_type": "markdown", - "id": "42a82617", + "id": "81270aed", "metadata": { "editable": true }, @@ -269,7 +341,7 @@ }, { "cell_type": "markdown", - "id": "6decfc58", + "id": "bfe945b6", "metadata": { "editable": true }, @@ -293,7 +365,7 @@ }, { "cell_type": "markdown", - "id": "a61b4604", + "id": "ba7ef646", "metadata": { "editable": true }, @@ -311,7 +383,7 @@ }, { "cell_type": "markdown", - "id": "71c70af5", + "id": "b834cc04", "metadata": { "editable": true }, @@ -327,7 +399,7 @@ }, { "cell_type": "markdown", - "id": "5f3d8268", + "id": "da8627ae", "metadata": { "editable": true }, @@ -339,7 +411,7 @@ }, { "cell_type": "markdown", - "id": "18a2531c", + "id": "e849da7b", "metadata": { "editable": true }, @@ -357,7 +429,7 @@ }, { "cell_type": "markdown", - "id": "5627463f", + "id": "65a452bd", "metadata": { "editable": true }, @@ -369,7 +441,7 @@ }, { "cell_type": "markdown", - "id": "7cf74dba", + "id": "9deb7a37", "metadata": { "editable": true }, @@ -381,7 +453,7 @@ }, { "cell_type": "markdown", - "id": "54bf9b00", + "id": "38f213f1", "metadata": { "editable": true }, @@ -393,7 +465,7 @@ }, { "cell_type": "markdown", - "id": "b6994b98", + "id": "45ae1136", "metadata": { "editable": true }, @@ -403,7 +475,7 @@ }, { "cell_type": "markdown", - "id": "92286814", + "id": "f66c26e8", "metadata": { "editable": true }, @@ -415,7 +487,7 @@ }, { "cell_type": "markdown", - "id": "dc75b709", + "id": "dd3a5090", "metadata": { "editable": true }, @@ -425,7 +497,7 @@ }, { "cell_type": "markdown", - "id": "7f7f686e", + "id": "7872d11e", "metadata": { "editable": true }, @@ -437,7 +509,7 @@ }, { "cell_type": "markdown", - "id": "5c1e66d4", + "id": "ad0213af", "metadata": { "editable": true }, @@ -449,7 +521,7 @@ }, { "cell_type": "markdown", - "id": "10827c34", + "id": "d78419a1", "metadata": { "editable": true }, @@ -459,7 +531,7 @@ }, { "cell_type": "markdown", - "id": "0fc58145", + "id": "fb800d02", "metadata": { "editable": true }, @@ -471,7 +543,7 @@ }, { "cell_type": "markdown", - "id": "19f43080", + "id": "7f487392", "metadata": { "editable": true }, @@ -481,7 +553,7 @@ }, { "cell_type": "markdown", - "id": "0b2486f6", + "id": "e39f7c17", "metadata": { "editable": true }, @@ -493,7 +565,7 @@ }, { "cell_type": "markdown", - "id": "ec1496ee", + "id": "aca472b7", "metadata": { "editable": true }, @@ -505,7 +577,7 @@ }, { "cell_type": "markdown", - "id": "965d89d3", + "id": "b7987ee7", "metadata": { "editable": true }, @@ -515,7 +587,7 @@ }, { "cell_type": "markdown", - "id": "3ad09b32", + "id": "8ed718ee", "metadata": { "editable": true }, @@ -528,7 +600,7 @@ }, { "cell_type": "markdown", - "id": "373ea912", + "id": "15bbe9c3", "metadata": { "editable": true }, @@ -538,7 +610,7 @@ }, { "cell_type": "markdown", - "id": "d33132ee", + "id": "7827172f", "metadata": { "editable": true }, @@ -550,7 +622,7 @@ }, { "cell_type": "markdown", - "id": "b7b2eeb0", + "id": "52a8084f", "metadata": { "editable": true }, @@ -565,7 +637,7 @@ }, { "cell_type": "markdown", - "id": "f15109cc", + "id": "67569afe", "metadata": { "editable": true }, @@ -578,7 +650,7 @@ }, { "cell_type": "markdown", - "id": "e28f51b3", + "id": "be6c3913", "metadata": { "editable": true }, @@ -590,7 +662,7 @@ }, { "cell_type": "markdown", - "id": "aa47497d", + "id": "8b9c9420", "metadata": { "editable": true }, @@ -602,7 +674,7 @@ }, { "cell_type": "markdown", - "id": "479b3cf8", + "id": "884822e5", "metadata": { "editable": true }, @@ -614,7 +686,7 @@ }, { "cell_type": "markdown", - "id": "cbfc4e96", + "id": "20fb19fc", "metadata": { "editable": true }, @@ -624,7 +696,7 @@ }, { "cell_type": "markdown", - "id": "1804d870", + "id": "ba368a55", "metadata": { "editable": true }, @@ -637,7 +709,7 @@ }, { "cell_type": "markdown", - "id": "8665f556", + "id": "6de29ccc", "metadata": { "editable": true }, @@ -648,7 +720,7 @@ }, { "cell_type": "markdown", - "id": "2617c164", + "id": "e083172b", "metadata": { "editable": true }, @@ -660,7 +732,7 @@ }, { "cell_type": "markdown", - "id": "1b4e7fc9", + "id": "4dff529c", "metadata": { "editable": true }, @@ -683,7 +755,7 @@ }, { "cell_type": "markdown", - "id": "1e8c7fc1", + "id": "7facb953", "metadata": { "editable": true }, @@ -705,7 +777,7 @@ }, { "cell_type": "markdown", - "id": "c9a064e3", + "id": "803a20ac", "metadata": { "editable": true }, @@ -732,7 +804,7 @@ }, { "cell_type": "markdown", - "id": "a6030cc8", + "id": "63aa6478", "metadata": { "editable": true }, @@ -748,7 +820,7 @@ }, { "cell_type": "markdown", - "id": "b07f2150", + "id": "93b10615", "metadata": { "editable": true }, @@ -765,7 +837,7 @@ }, { "cell_type": "markdown", - "id": "aaaef430", + "id": "bcdc8f2e", "metadata": { "editable": true }, @@ -777,7 +849,7 @@ }, { "cell_type": "markdown", - "id": "d5058a10", + "id": "63764264", "metadata": { "editable": true }, @@ -788,7 +860,7 @@ }, { "cell_type": "markdown", - "id": "1cec6c48", + "id": "6ea4a585", "metadata": { "editable": true }, @@ -807,7 +879,7 @@ }, { "cell_type": "markdown", - "id": "6922ddc2", + "id": "f3d0b2f4", "metadata": { "editable": true }, @@ -836,7 +908,7 @@ }, { "cell_type": "markdown", - "id": "468855ba", + "id": "72e9a70e", "metadata": { "editable": true }, @@ -850,7 +922,7 @@ }, { "cell_type": "markdown", - "id": "5741b4de", + "id": "24166378", "metadata": { "editable": true }, @@ -867,7 +939,7 @@ }, { "cell_type": "markdown", - "id": "8dbcb2ac", + "id": "ad0e414e", "metadata": { "editable": true }, @@ -883,7 +955,7 @@ }, { "cell_type": "markdown", - "id": "92ef2acd", + "id": "4a7b41e1", "metadata": { "editable": true }, @@ -895,7 +967,7 @@ }, { "cell_type": "markdown", - "id": "c3144610", + "id": "c4acf546", "metadata": { "editable": true }, @@ -909,7 +981,7 @@ }, { "cell_type": "markdown", - "id": "737eecdd", + "id": "e353dc32", "metadata": { "editable": true }, @@ -921,7 +993,7 @@ }, { "cell_type": "markdown", - "id": "3ea34391", + "id": "5a87dccb", "metadata": { "editable": true }, @@ -937,7 +1009,7 @@ }, { "cell_type": "markdown", - "id": "083c5bb4", + "id": "35491e6c", "metadata": { "editable": true }, @@ -949,7 +1021,7 @@ }, { "cell_type": "markdown", - "id": "f8120236", + "id": "b809821f", "metadata": { "editable": true }, @@ -959,7 +1031,7 @@ }, { "cell_type": "markdown", - "id": "25901a92", + "id": "774f0764", "metadata": { "editable": true }, @@ -971,7 +1043,7 @@ }, { "cell_type": "markdown", - "id": "c8d0a8c9", + "id": "3d7088d0", "metadata": { "editable": true }, @@ -989,7 +1061,7 @@ }, { "cell_type": "markdown", - "id": "0dcf9e9c", + "id": "ec2718cd", "metadata": { "editable": true }, @@ -1002,7 +1074,7 @@ }, { "cell_type": "markdown", - "id": "649cfac1", + "id": "8ef486d5", "metadata": { "editable": true }, @@ -1016,7 +1088,7 @@ }, { "cell_type": "markdown", - "id": "c521b370", + "id": "855549f0", "metadata": { "editable": true }, @@ -1031,7 +1103,7 @@ }, { "cell_type": "markdown", - "id": "44f11300", + "id": "6467d797", "metadata": { "editable": true }, @@ -1043,7 +1115,7 @@ }, { "cell_type": "markdown", - "id": "ffb62a75", + "id": "c2b8c002", "metadata": { "editable": true }, @@ -1057,7 +1129,7 @@ }, { "cell_type": "markdown", - "id": "318c7a3c", + "id": "45b6c18c", "metadata": { "editable": true }, @@ -1069,7 +1141,7 @@ }, { "cell_type": "markdown", - "id": "eb6d94c6", + "id": "e6020883", "metadata": { "editable": true }, @@ -1085,7 +1157,7 @@ }, { "cell_type": "markdown", - "id": "0c0b217c", + "id": "87360dbd", "metadata": { "editable": true }, @@ -1097,7 +1169,7 @@ }, { "cell_type": "markdown", - "id": "05e66932", + "id": "392dda6a", "metadata": { "editable": true }, @@ -1107,7 +1179,7 @@ }, { "cell_type": "markdown", - "id": "c3862f3e", + "id": "e7ab5332", "metadata": { "editable": true }, @@ -1119,7 +1191,7 @@ }, { "cell_type": "markdown", - "id": "dfbb2efd", + "id": "f1f1bcc8", "metadata": { "editable": true }, @@ -1129,7 +1201,7 @@ }, { "cell_type": "markdown", - "id": "548a0d3e", + "id": "31c9897a", "metadata": { "editable": true }, @@ -1141,7 +1213,7 @@ }, { "cell_type": "markdown", - "id": "e66ce860", + "id": "cacef68b", "metadata": { "editable": true }, @@ -1151,7 +1223,7 @@ }, { "cell_type": "markdown", - "id": "cc72ed6b", + "id": "a310325f", "metadata": { "editable": true }, @@ -1163,7 +1235,7 @@ }, { "cell_type": "markdown", - "id": "42473f58", + "id": "e440e4eb", "metadata": { "editable": true }, @@ -1178,7 +1250,7 @@ }, { "cell_type": "markdown", - "id": "357226b9", + "id": "c6e7a1c8", "metadata": { "editable": true }, @@ -1190,7 +1262,7 @@ }, { "cell_type": "markdown", - "id": "b379b0b0", + "id": "445560fe", "metadata": { "editable": true }, @@ -1201,7 +1273,7 @@ { "cell_type": "code", "execution_count": 1, - "id": "6d3d7bc7", + "id": "79d0d646", "metadata": { "collapsed": false, "editable": true @@ -1264,7 +1336,7 @@ }, { "cell_type": "markdown", - "id": "488724fe", + "id": "42993345", "metadata": { "editable": true }, @@ -1274,7 +1346,7 @@ }, { "cell_type": "markdown", - "id": "3a9a55b4", + "id": "a9d5deec", "metadata": { "editable": true }, @@ -1295,7 +1367,7 @@ }, { "cell_type": "markdown", - "id": "ba544f79", + "id": "4d4929b1", "metadata": { "editable": true }, @@ -1311,7 +1383,7 @@ }, { "cell_type": "markdown", - "id": "6a57d7cb", + "id": "16680e62", "metadata": { "editable": true }, @@ -1323,7 +1395,7 @@ }, { "cell_type": "markdown", - "id": "b86d6a5c", + "id": "4e6d767a", "metadata": { "editable": true }, @@ -1335,7 +1407,7 @@ }, { "cell_type": "markdown", - "id": "9a69d392", + "id": "3263f740", "metadata": { "editable": true }, @@ -1345,7 +1417,7 @@ }, { "cell_type": "markdown", - "id": "b4ec9b75", + "id": "9c2a591a", "metadata": { "editable": true }, @@ -1357,7 +1429,7 @@ }, { "cell_type": "markdown", - "id": "9d2e3667", + "id": "f406da56", "metadata": { "editable": true }, @@ -1367,7 +1439,7 @@ }, { "cell_type": "markdown", - "id": "8d3b6020", + "id": "1793a059", "metadata": { "editable": true }, @@ -1379,7 +1451,7 @@ }, { "cell_type": "markdown", - "id": "76ba0da7", + "id": "f9194184", "metadata": { "editable": true }, @@ -1391,7 +1463,7 @@ }, { "cell_type": "markdown", - "id": "75627339", + "id": "1e573596", "metadata": { "editable": true }, @@ -1403,7 +1475,7 @@ }, { "cell_type": "markdown", - "id": "72c07ea7", + "id": "44cd323c", "metadata": { "editable": true }, @@ -1415,7 +1487,7 @@ }, { "cell_type": "markdown", - "id": "b7fcb9c9", + "id": "98b20680", "metadata": { "editable": true }, @@ -1427,7 +1499,7 @@ }, { "cell_type": "markdown", - "id": "b38f4012", + "id": "e0f06329", "metadata": { "editable": true }, @@ -1437,7 +1509,7 @@ }, { "cell_type": "markdown", - "id": "6bc73d7d", + "id": "43161030", "metadata": { "editable": true }, @@ -1453,7 +1525,7 @@ }, { "cell_type": "markdown", - "id": "5e908359", + "id": "2ef1c290", "metadata": { "editable": true }, @@ -1465,7 +1537,7 @@ }, { "cell_type": "markdown", - "id": "c0dcfd84", + "id": "bd16d507", "metadata": { "editable": true }, @@ -1477,7 +1549,7 @@ }, { "cell_type": "markdown", - "id": "8ae30522", + "id": "3e536774", "metadata": { "editable": true }, @@ -1487,7 +1559,7 @@ }, { "cell_type": "markdown", - "id": "6fcf91f5", + "id": "379286f4", "metadata": { "editable": true }, @@ -1499,7 +1571,7 @@ }, { "cell_type": "markdown", - "id": "35b52b3e", + "id": "de4009b3", "metadata": { "editable": true }, @@ -1513,7 +1585,7 @@ }, { "cell_type": "markdown", - "id": "4ddb7234", + "id": "7692a4d5", "metadata": { "editable": true }, @@ -1531,7 +1603,7 @@ { "cell_type": "code", "execution_count": 2, - "id": "2e72d589", + "id": "d4814e9c", "metadata": { "collapsed": false, "editable": true @@ -1604,7 +1676,7 @@ }, { "cell_type": "markdown", - "id": "32b998f9", + "id": "cd7ae3ed", "metadata": { "editable": true }, @@ -1614,7 +1686,7 @@ }, { "cell_type": "markdown", - "id": "539faf33", + "id": "6bb9a0e1", "metadata": { "editable": true }, @@ -1632,7 +1704,7 @@ }, { "cell_type": "markdown", - "id": "3d544ddd", + "id": "06563003", "metadata": { "editable": true }, @@ -1649,7 +1721,7 @@ }, { "cell_type": "markdown", - "id": "746b5284", + "id": "5a3fe09a", "metadata": { "editable": true }, @@ -1661,7 +1733,7 @@ }, { "cell_type": "markdown", - "id": "1696958f", + "id": "321b4a6d", "metadata": { "editable": true }, @@ -1671,7 +1743,7 @@ }, { "cell_type": "markdown", - "id": "f2ba678d", + "id": "3a8f136c", "metadata": { "editable": true }, @@ -1683,7 +1755,7 @@ }, { "cell_type": "markdown", - "id": "1eea5ba5", + "id": "bb33455e", "metadata": { "editable": true }, @@ -1695,7 +1767,7 @@ }, { "cell_type": "markdown", - "id": "0d307a0a", + "id": "4b97a8d6", "metadata": { "editable": true }, @@ -1707,7 +1779,7 @@ }, { "cell_type": "markdown", - "id": "6e08427d", + "id": "cfbbb52d", "metadata": { "editable": true }, @@ -1718,7 +1790,7 @@ }, { "cell_type": "markdown", - "id": "bc0d4e8c", + "id": "a3f8f887", "metadata": { "editable": true }, @@ -1730,7 +1802,7 @@ }, { "cell_type": "markdown", - "id": "80fe16aa", + "id": "a097c6e9", "metadata": { "editable": true }, @@ -1743,7 +1815,7 @@ }, { "cell_type": "markdown", - "id": "8e3b41ed", + "id": "9ec6339d", "metadata": { "editable": true }, @@ -1755,7 +1827,7 @@ }, { "cell_type": "markdown", - "id": "24a47c29", + "id": "b340d6f5", "metadata": { "editable": true }, @@ -1765,7 +1837,7 @@ }, { "cell_type": "markdown", - "id": "d6ecace5", + "id": "3bc67ced", "metadata": { "editable": true }, @@ -1777,7 +1849,7 @@ }, { "cell_type": "markdown", - "id": "78a2d508", + "id": "fa8e855e", "metadata": { "editable": true }, @@ -1789,7 +1861,7 @@ }, { "cell_type": "markdown", - "id": "8e573a81", + "id": "fec79799", "metadata": { "editable": true }, @@ -1801,7 +1873,7 @@ }, { "cell_type": "markdown", - "id": "89fed5d1", + "id": "e92e9b1f", "metadata": { "editable": true }, @@ -1811,7 +1883,7 @@ }, { "cell_type": "markdown", - "id": "509fe929", + "id": "4b4e69bf", "metadata": { "editable": true }, @@ -1823,7 +1895,7 @@ }, { "cell_type": "markdown", - "id": "a5acbe91", + "id": "a660f211", "metadata": { "editable": true }, @@ -1839,7 +1911,7 @@ }, { "cell_type": "markdown", - "id": "6f7aba23", + "id": "1989b4fc", "metadata": { "editable": true }, @@ -1851,7 +1923,7 @@ }, { "cell_type": "markdown", - "id": "5dabc790", + "id": "4a034c14", "metadata": { "editable": true }, @@ -1861,7 +1933,7 @@ }, { "cell_type": "markdown", - "id": "8ec2c06b", + "id": "274e1c4d", "metadata": { "editable": true }, @@ -1873,7 +1945,7 @@ }, { "cell_type": "markdown", - "id": "463b5820", + "id": "533ae9e7", "metadata": { "editable": true }, @@ -1883,7 +1955,7 @@ }, { "cell_type": "markdown", - "id": "acb281ab", + "id": "bea1c60a", "metadata": { "editable": true }, @@ -1895,7 +1967,7 @@ }, { "cell_type": "markdown", - "id": "09c11ccd", + "id": "5ce5b908", "metadata": { "editable": true }, @@ -1907,7 +1979,7 @@ }, { "cell_type": "markdown", - "id": "97e8699c", + "id": "324b5c28", "metadata": { "editable": true }, @@ -1920,7 +1992,7 @@ }, { "cell_type": "markdown", - "id": "0fee4b8a", + "id": "adff05cd", "metadata": { "editable": true }, @@ -1930,7 +2002,7 @@ }, { "cell_type": "markdown", - "id": "ee97de7e", + "id": "5270d5fa", "metadata": { "editable": true }, @@ -1942,7 +2014,7 @@ }, { "cell_type": "markdown", - "id": "ddd92c77", + "id": "c83a9ef4", "metadata": { "editable": true }, @@ -1952,7 +2024,7 @@ }, { "cell_type": "markdown", - "id": "32b4c422", + "id": "747da5ad", "metadata": { "editable": true }, @@ -1964,7 +2036,7 @@ }, { "cell_type": "markdown", - "id": "f113b389", + "id": "db9aa150", "metadata": { "editable": true }, @@ -1975,7 +2047,7 @@ }, { "cell_type": "markdown", - "id": "20997a37", + "id": "fe3ce041", "metadata": { "editable": true }, @@ -1987,7 +2059,7 @@ }, { "cell_type": "markdown", - "id": "4a24af39", + "id": "1d7cbe9d", "metadata": { "editable": true }, @@ -1997,7 +2069,7 @@ }, { "cell_type": "markdown", - "id": "e80b80d7", + "id": "9e16a856", "metadata": { "editable": true }, @@ -2009,7 +2081,7 @@ }, { "cell_type": "markdown", - "id": "259979fa", + "id": "095624ba", "metadata": { "editable": true }, @@ -2019,7 +2091,7 @@ }, { "cell_type": "markdown", - "id": "3f1794a3", + "id": "bd65ea09", "metadata": { "editable": true }, @@ -2031,7 +2103,7 @@ }, { "cell_type": "markdown", - "id": "40b3ea46", + "id": "008e4371", "metadata": { "editable": true }, @@ -2043,7 +2115,7 @@ }, { "cell_type": "markdown", - "id": "8436dd52", + "id": "f55b8eab", "metadata": { "editable": true }, @@ -2055,7 +2127,7 @@ }, { "cell_type": "markdown", - "id": "462e0f16", + "id": "5126ab8e", "metadata": { "editable": true }, @@ -2065,7 +2137,7 @@ }, { "cell_type": "markdown", - "id": "e1ba411b", + "id": "0f506e21", "metadata": { "editable": true }, @@ -2077,7 +2149,7 @@ }, { "cell_type": "markdown", - "id": "91dc27cf", + "id": "f09773bb", "metadata": { "editable": true }, @@ -2087,7 +2159,7 @@ }, { "cell_type": "markdown", - "id": "36be98a8", + "id": "13faa8b3", "metadata": { "editable": true }, @@ -2099,7 +2171,7 @@ }, { "cell_type": "markdown", - "id": "73aa970e", + "id": "a1b55a23", "metadata": { "editable": true }, @@ -2111,7 +2183,7 @@ }, { "cell_type": "markdown", - "id": "2f9fefd0", + "id": "f544649c", "metadata": { "editable": true }, @@ -2123,7 +2195,7 @@ }, { "cell_type": "markdown", - "id": "3daa9158", + "id": "9406250d", "metadata": { "editable": true }, @@ -2133,7 +2205,7 @@ }, { "cell_type": "markdown", - "id": "a1477c87", + "id": "65f3005e", "metadata": { "editable": true }, @@ -2145,7 +2217,7 @@ }, { "cell_type": "markdown", - "id": "26998b29", + "id": "df7ed34c", "metadata": { "editable": true }, @@ -2155,7 +2227,7 @@ }, { "cell_type": "markdown", - "id": "55ad7d17", + "id": "d4c8367a", "metadata": { "editable": true }, @@ -2168,7 +2240,7 @@ }, { "cell_type": "markdown", - "id": "6fc2128d", + "id": "8153eaa3", "metadata": { "editable": true }, @@ -2180,7 +2252,7 @@ }, { "cell_type": "markdown", - "id": "0e5d7a6f", + "id": "a6e2ae3c", "metadata": { "editable": true }, @@ -2190,7 +2262,7 @@ }, { "cell_type": "markdown", - "id": "d3ba6619", + "id": "1d1a819f", "metadata": { "editable": true }, @@ -2202,7 +2274,7 @@ }, { "cell_type": "markdown", - "id": "5d7c4a09", + "id": "ae342d50", "metadata": { "editable": true }, @@ -2212,7 +2284,7 @@ }, { "cell_type": "markdown", - "id": "9a46bc52", + "id": "f10677c1", "metadata": { "editable": true }, @@ -2224,7 +2296,7 @@ }, { "cell_type": "markdown", - "id": "5824f12f", + "id": "dea5c92f", "metadata": { "editable": true }, @@ -2234,7 +2306,7 @@ }, { "cell_type": "markdown", - "id": "c9ab0962", + "id": "208ad938", "metadata": { "editable": true }, @@ -2246,7 +2318,7 @@ }, { "cell_type": "markdown", - "id": "46580516", + "id": "48f04f55", "metadata": { "editable": true }, @@ -2256,7 +2328,7 @@ }, { "cell_type": "markdown", - "id": "80c6d25f", + "id": "fc04866a", "metadata": { "editable": true }, @@ -2268,7 +2340,7 @@ }, { "cell_type": "markdown", - "id": "31af2e01", + "id": "d8ba650b", "metadata": { "editable": true }, @@ -2280,7 +2352,7 @@ }, { "cell_type": "markdown", - "id": "911a4536", + "id": "9fc12f59", "metadata": { "editable": true }, @@ -2290,7 +2362,7 @@ }, { "cell_type": "markdown", - "id": "15977f20", + "id": "53bffa05", "metadata": { "editable": true }, @@ -2302,7 +2374,7 @@ }, { "cell_type": "markdown", - "id": "03d90238", + "id": "6203af4d", "metadata": { "editable": true }, @@ -2315,7 +2387,7 @@ }, { "cell_type": "markdown", - "id": "43326fe9", + "id": "16eb0b7d", "metadata": { "editable": true }, @@ -2329,7 +2401,7 @@ }, { "cell_type": "markdown", - "id": "a3d38666", + "id": "f10f405a", "metadata": { "editable": true }, @@ -2341,7 +2413,7 @@ }, { "cell_type": "markdown", - "id": "855e549c", + "id": "36bce326", "metadata": { "editable": true }, @@ -2351,7 +2423,7 @@ }, { "cell_type": "markdown", - "id": "dc7c5791", + "id": "ca2b8428", "metadata": { "editable": true }, @@ -2363,7 +2435,7 @@ }, { "cell_type": "markdown", - "id": "26863e76", + "id": "ba48e670", "metadata": { "editable": true }, @@ -2373,7 +2445,7 @@ }, { "cell_type": "markdown", - "id": "49ea54a9", + "id": "6f4abc32", "metadata": { "editable": true }, @@ -2385,7 +2457,7 @@ }, { "cell_type": "markdown", - "id": "8cefe41f", + "id": "3e0c9116", "metadata": { "editable": true }, @@ -2403,7 +2475,7 @@ }, { "cell_type": "markdown", - "id": "b92b6a6b", + "id": "a8270c3a", "metadata": { "editable": true }, @@ -2421,7 +2493,7 @@ }, { "cell_type": "markdown", - "id": "81d10955", + "id": "36355d0d", "metadata": { "editable": true }, @@ -2433,7 +2505,7 @@ }, { "cell_type": "markdown", - "id": "13f5caf9", + "id": "40976099", "metadata": { "editable": true }, @@ -2443,7 +2515,7 @@ }, { "cell_type": "markdown", - "id": "f1569d1c", + "id": "17a9cb41", "metadata": { "editable": true }, @@ -2455,7 +2527,7 @@ }, { "cell_type": "markdown", - "id": "df375f58", + "id": "5857bfb0", "metadata": { "editable": true }, @@ -2467,7 +2539,7 @@ }, { "cell_type": "markdown", - "id": "fd8b7734", + "id": "e351d319", "metadata": { "editable": true }, @@ -2479,7 +2551,7 @@ }, { "cell_type": "markdown", - "id": "0f5b6cf9", + "id": "1d820b48", "metadata": { "editable": true }, @@ -2489,7 +2561,7 @@ }, { "cell_type": "markdown", - "id": "e8bc5adf", + "id": "4fb8bc9c", "metadata": { "editable": true }, @@ -2501,7 +2573,7 @@ }, { "cell_type": "markdown", - "id": "d2355495", + "id": "5dca3cda", "metadata": { "editable": true }, @@ -2511,7 +2583,7 @@ }, { "cell_type": "markdown", - "id": "9e1076d2", + "id": "5871156c", "metadata": { "editable": true }, @@ -2523,7 +2595,7 @@ }, { "cell_type": "markdown", - "id": "3602282c", + "id": "a7a60892", "metadata": { "editable": true }, @@ -2541,7 +2613,7 @@ }, { "cell_type": "markdown", - "id": "c0c6c034", + "id": "9b0bdf83", "metadata": { "editable": true }, @@ -2551,7 +2623,7 @@ }, { "cell_type": "markdown", - "id": "72b23035", + "id": "a7e03a9c", "metadata": { "editable": true }, @@ -2569,7 +2641,7 @@ }, { "cell_type": "markdown", - "id": "796fd960", + "id": "5624e6de", "metadata": { "editable": true }, @@ -2579,7 +2651,7 @@ }, { "cell_type": "markdown", - "id": "bfd68302", + "id": "a6a73788", "metadata": { "editable": true }, @@ -2597,7 +2669,7 @@ }, { "cell_type": "markdown", - "id": "b89d4ce5", + "id": "fa0c863b", "metadata": { "editable": true }, @@ -2609,7 +2681,7 @@ }, { "cell_type": "markdown", - "id": "5b33e828", + "id": "7b9fd625", "metadata": { "editable": true }, @@ -2621,7 +2693,7 @@ }, { "cell_type": "markdown", - "id": "3fcc8d2b", + "id": "2d6110ec", "metadata": { "editable": true }, @@ -2631,7 +2703,7 @@ }, { "cell_type": "markdown", - "id": "99477612", + "id": "98aee849", "metadata": { "editable": true }, @@ -2643,7 +2715,7 @@ }, { "cell_type": "markdown", - "id": "8621b33f", + "id": "124cb42a", "metadata": { "editable": true }, @@ -2655,7 +2727,7 @@ }, { "cell_type": "markdown", - "id": "0a0b0036", + "id": "4dedc423", "metadata": { "editable": true }, @@ -2665,7 +2737,7 @@ }, { "cell_type": "markdown", - "id": "76785d71", + "id": "fbead9d4", "metadata": { "editable": true }, @@ -2677,7 +2749,7 @@ }, { "cell_type": "markdown", - "id": "268624da", + "id": "89702b4f", "metadata": { "editable": true }, @@ -2687,7 +2759,7 @@ }, { "cell_type": "markdown", - "id": "858189d8", + "id": "2fb650ad", "metadata": { "editable": true }, @@ -2699,7 +2771,7 @@ }, { "cell_type": "markdown", - "id": "8f7cbce7", + "id": "b32a1fbe", "metadata": { "editable": true }, @@ -2711,7 +2783,7 @@ }, { "cell_type": "markdown", - "id": "d5154dbc", + "id": "5cac9518", "metadata": { "editable": true }, @@ -2734,7 +2806,7 @@ }, { "cell_type": "markdown", - "id": "6eb82dfd", + "id": "5c7f6367", "metadata": { "editable": true }, @@ -2746,7 +2818,7 @@ }, { "cell_type": "markdown", - "id": "b9d0d75b", + "id": "be6e74e5", "metadata": { "editable": true }, @@ -2758,7 +2830,7 @@ }, { "cell_type": "markdown", - "id": "1ac45c13", + "id": "cea6dc43", "metadata": { "editable": true }, @@ -2768,7 +2840,7 @@ }, { "cell_type": "markdown", - "id": "8d7c5421", + "id": "cd81fdd7", "metadata": { "editable": true }, @@ -2780,7 +2852,7 @@ }, { "cell_type": "markdown", - "id": "639a1d71", + "id": "a4e2076f", "metadata": { "editable": true }, @@ -2794,7 +2866,7 @@ }, { "cell_type": "markdown", - "id": "747d9ce2", + "id": "623d338d", "metadata": { "editable": true }, @@ -2806,7 +2878,7 @@ }, { "cell_type": "markdown", - "id": "2c075399", + "id": "9a3168d0", "metadata": { "editable": true }, @@ -2818,7 +2890,7 @@ }, { "cell_type": "markdown", - "id": "e9e56bb2", + "id": "02c62e59", "metadata": { "editable": true }, @@ -2828,7 +2900,7 @@ }, { "cell_type": "markdown", - "id": "058c194a", + "id": "b029455b", "metadata": { "editable": true }, @@ -2840,7 +2912,7 @@ }, { "cell_type": "markdown", - "id": "d9b83223", + "id": "ea5337b3", "metadata": { "editable": true }, @@ -2852,7 +2924,7 @@ }, { "cell_type": "markdown", - "id": "1ab0c539", + "id": "b475cb8d", "metadata": { "editable": true }, @@ -2862,7 +2934,7 @@ }, { "cell_type": "markdown", - "id": "3d05c18a", + "id": "462823b9", "metadata": { "editable": true }, @@ -2874,7 +2946,7 @@ }, { "cell_type": "markdown", - "id": "455bb9c7", + "id": "f23b093a", "metadata": { "editable": true }, @@ -2886,7 +2958,7 @@ }, { "cell_type": "markdown", - "id": "19840651", + "id": "8e75c1c2", "metadata": { "editable": true }, @@ -2898,7 +2970,7 @@ }, { "cell_type": "markdown", - "id": "68f7526a", + "id": "3505a76f", "metadata": { "editable": true }, @@ -2922,7 +2994,7 @@ }, { "cell_type": "markdown", - "id": "d589304f", + "id": "093e6c22", "metadata": { "editable": true }, @@ -2934,7 +3006,7 @@ }, { "cell_type": "markdown", - "id": "fe9cc290", + "id": "ea632867", "metadata": { "editable": true }, @@ -2946,7 +3018,7 @@ }, { "cell_type": "markdown", - "id": "d5be6be2", + "id": "706cd35f", "metadata": { "editable": true }, @@ -2960,7 +3032,7 @@ }, { "cell_type": "markdown", - "id": "b41c864e", + "id": "3dc14b73", "metadata": { "editable": true }, @@ -2972,7 +3044,7 @@ }, { "cell_type": "markdown", - "id": "87ab7583", + "id": "99e4989d", "metadata": { "editable": true }, @@ -2984,7 +3056,7 @@ }, { "cell_type": "markdown", - "id": "4409ece8", + "id": "6f6aca14", "metadata": { "editable": true }, @@ -2996,7 +3068,7 @@ }, { "cell_type": "markdown", - "id": "c7cb9fd6", + "id": "d976441f", "metadata": { "editable": true }, @@ -3006,7 +3078,7 @@ }, { "cell_type": "markdown", - "id": "70f1f74e", + "id": "219b5965", "metadata": { "editable": true }, @@ -3018,7 +3090,7 @@ }, { "cell_type": "markdown", - "id": "9962b72d", + "id": "d407496e", "metadata": { "editable": true }, @@ -3033,7 +3105,7 @@ }, { "cell_type": "markdown", - "id": "fd30a147", + "id": "09154c06", "metadata": { "editable": true }, @@ -3045,7 +3117,7 @@ }, { "cell_type": "markdown", - "id": "b83bdd29", + "id": "60583422", "metadata": { "editable": true }, @@ -3055,7 +3127,7 @@ }, { "cell_type": "markdown", - "id": "ce67b5a2", + "id": "09d0c1d3", "metadata": { "editable": true }, @@ -3067,7 +3139,7 @@ }, { "cell_type": "markdown", - "id": "87cedee1", + "id": "10e07dcb", "metadata": { "editable": true }, @@ -3080,7 +3152,7 @@ }, { "cell_type": "markdown", - "id": "51b487cb", + "id": "ec807692", "metadata": { "editable": true }, @@ -3092,7 +3164,7 @@ }, { "cell_type": "markdown", - "id": "3ba28bfc", + "id": "b5d52ab5", "metadata": { "editable": true }, @@ -3104,7 +3176,7 @@ }, { "cell_type": "markdown", - "id": "11ceae2d", + "id": "f1c09bbf", "metadata": { "editable": true }, @@ -3114,7 +3186,7 @@ }, { "cell_type": "markdown", - "id": "ac8dc0bf", + "id": "e8145abd", "metadata": { "editable": true }, @@ -3126,7 +3198,7 @@ }, { "cell_type": "markdown", - "id": "48bed24e", + "id": "d8a60c69", "metadata": { "editable": true }, @@ -3138,7 +3210,7 @@ }, { "cell_type": "markdown", - "id": "cc1da068", + "id": "f87035b3", "metadata": { "editable": true }, @@ -3150,7 +3222,7 @@ }, { "cell_type": "markdown", - "id": "3c83e95b", + "id": "63061a84", "metadata": { "editable": true }, @@ -3161,7 +3233,7 @@ }, { "cell_type": "markdown", - "id": "38f733fe", + "id": "e90629e1", "metadata": { "editable": true }, @@ -3173,7 +3245,7 @@ }, { "cell_type": "markdown", - "id": "2e7a9158", + "id": "4dfb4385", "metadata": { "editable": true }, @@ -3187,7 +3259,7 @@ }, { "cell_type": "markdown", - "id": "1e30681b", + "id": "eae9ef99", "metadata": { "editable": true }, @@ -3199,7 +3271,7 @@ }, { "cell_type": "markdown", - "id": "a6b4d14c", + "id": "79e853e3", "metadata": { "editable": true }, @@ -3209,7 +3281,7 @@ }, { "cell_type": "markdown", - "id": "541f9c56", + "id": "b5822deb", "metadata": { "editable": true }, @@ -3221,7 +3293,7 @@ }, { "cell_type": "markdown", - "id": "17608bef", + "id": "3da9d274", "metadata": { "editable": true }, @@ -3233,7 +3305,7 @@ }, { "cell_type": "markdown", - "id": "820bcc62", + "id": "0c362f5f", "metadata": { "editable": true }, @@ -3245,7 +3317,7 @@ }, { "cell_type": "markdown", - "id": "71fc4a0f", + "id": "e4fbadf8", "metadata": { "editable": true }, @@ -3256,7 +3328,7 @@ }, { "cell_type": "markdown", - "id": "1c587edf", + "id": "13aad1f3", "metadata": { "editable": true }, @@ -3268,7 +3340,7 @@ }, { "cell_type": "markdown", - "id": "706a3b88", + "id": "6cf15217", "metadata": { "editable": true }, @@ -3280,7 +3352,7 @@ }, { "cell_type": "markdown", - "id": "10d69f88", + "id": "b7f2b57b", "metadata": { "editable": true }, @@ -3290,7 +3362,7 @@ }, { "cell_type": "markdown", - "id": "c984a8c9", + "id": "84697aa7", "metadata": { "editable": true }, @@ -3301,7 +3373,7 @@ }, { "cell_type": "markdown", - "id": "1f3507da", + "id": "cff18fb0", "metadata": { "editable": true }, @@ -3313,7 +3385,7 @@ }, { "cell_type": "markdown", - "id": "31006b23", + "id": "b200258b", "metadata": { "editable": true }, @@ -3323,7 +3395,7 @@ }, { "cell_type": "markdown", - "id": "7f9c8c5c", + "id": "78c3e330", "metadata": { "editable": true }, @@ -3335,7 +3407,7 @@ }, { "cell_type": "markdown", - "id": "b242d816", + "id": "5c039452", "metadata": { "editable": true }, @@ -3347,7 +3419,7 @@ }, { "cell_type": "markdown", - "id": "97c7e276", + "id": "b1450211", "metadata": { "editable": true }, @@ -3359,7 +3431,7 @@ }, { "cell_type": "markdown", - "id": "aa72c2c9", + "id": "dbf02775", "metadata": { "editable": true }, @@ -3371,7 +3443,7 @@ }, { "cell_type": "markdown", - "id": "7efbb211", + "id": "aee826e9", "metadata": { "editable": true }, @@ -3383,7 +3455,7 @@ }, { "cell_type": "markdown", - "id": "25abbafc", + "id": "7c21e99c", "metadata": { "editable": true }, @@ -3393,7 +3465,7 @@ }, { "cell_type": "markdown", - "id": "0d8169cf", + "id": "5614ea79", "metadata": { "editable": true }, @@ -3405,7 +3477,7 @@ }, { "cell_type": "markdown", - "id": "109e23a9", + "id": "9c011fbe", "metadata": { "editable": true }, @@ -3415,7 +3487,7 @@ }, { "cell_type": "markdown", - "id": "096e8335", + "id": "47c4916b", "metadata": { "editable": true }, @@ -3427,7 +3499,7 @@ }, { "cell_type": "markdown", - "id": "ba431dc6", + "id": "c6519f09", "metadata": { "editable": true }, @@ -3438,7 +3510,7 @@ }, { "cell_type": "markdown", - "id": "47c475cc", + "id": "e82bc48f", "metadata": { "editable": true }, @@ -3450,7 +3522,7 @@ }, { "cell_type": "markdown", - "id": "259b2990", + "id": "0a42012d", "metadata": { "editable": true }, @@ -3462,7 +3534,7 @@ }, { "cell_type": "markdown", - "id": "22171ca6", + "id": "b1129c95", "metadata": { "editable": true }, @@ -3474,7 +3546,7 @@ }, { "cell_type": "markdown", - "id": "82ca77ee", + "id": "4fb7028c", "metadata": { "editable": true }, @@ -3486,7 +3558,7 @@ }, { "cell_type": "markdown", - "id": "40b8154c", + "id": "4a25c045", "metadata": { "editable": true }, @@ -3496,7 +3568,7 @@ }, { "cell_type": "markdown", - "id": "a1e935a0", + "id": "64f27aa4", "metadata": { "editable": true }, @@ -3508,7 +3580,7 @@ }, { "cell_type": "markdown", - "id": "46485779", + "id": "04553f9f", "metadata": { "editable": true }, @@ -3519,7 +3591,7 @@ }, { "cell_type": "markdown", - "id": "7502ef1e", + "id": "15f0ce3a", "metadata": { "editable": true }, @@ -3531,7 +3603,7 @@ }, { "cell_type": "markdown", - "id": "72dd1e56", + "id": "b2a97150", "metadata": { "editable": true }, @@ -3543,7 +3615,7 @@ }, { "cell_type": "markdown", - "id": "080b0628", + "id": "9c794d16", "metadata": { "editable": true }, @@ -3553,7 +3625,7 @@ }, { "cell_type": "markdown", - "id": "1d32cc85", + "id": "a6986c04", "metadata": { "editable": true }, @@ -3568,7 +3640,7 @@ }, { "cell_type": "markdown", - "id": "e31fa04f", + "id": "de147884", "metadata": { "editable": true }, @@ -3580,7 +3652,7 @@ }, { "cell_type": "markdown", - "id": "908b8f90", + "id": "bd7c2f00", "metadata": { "editable": true }, @@ -3590,7 +3662,7 @@ }, { "cell_type": "markdown", - "id": "b077d73b", + "id": "5b2b56a2", "metadata": { "editable": true }, @@ -3602,7 +3674,7 @@ }, { "cell_type": "markdown", - "id": "31b867bd", + "id": "937d72bc", "metadata": { "editable": true }, @@ -3612,7 +3684,7 @@ }, { "cell_type": "markdown", - "id": "26c9578e", + "id": "cbec00b4", "metadata": { "editable": true }, @@ -3624,7 +3696,7 @@ }, { "cell_type": "markdown", - "id": "1e412a0f", + "id": "757dfd72", "metadata": { "editable": true }, @@ -3641,7 +3713,7 @@ }, { "cell_type": "markdown", - "id": "2c1a4259", + "id": "3ee3e71c", "metadata": { "editable": true }, @@ -3657,7 +3729,7 @@ }, { "cell_type": "markdown", - "id": "d6647898", + "id": "91c70e4b", "metadata": { "editable": true }, @@ -3671,7 +3743,7 @@ }, { "cell_type": "markdown", - "id": "ea36db9d", + "id": "a7073293", "metadata": { "editable": true }, @@ -3681,7 +3753,7 @@ }, { "cell_type": "markdown", - "id": "8dbd3642", + "id": "772217ef", "metadata": { "editable": true }, @@ -3693,7 +3765,7 @@ }, { "cell_type": "markdown", - "id": "bcef15f3", + "id": "a1488d82", "metadata": { "editable": true }, @@ -3707,7 +3779,7 @@ }, { "cell_type": "markdown", - "id": "c7eb6325", + "id": "78aa59cf", "metadata": { "editable": true }, @@ -3717,7 +3789,7 @@ }, { "cell_type": "markdown", - "id": "4087d974", + "id": "50b290c2", "metadata": { "editable": true }, @@ -3728,7 +3800,7 @@ { "cell_type": "code", "execution_count": 3, - "id": "4c915368", + "id": "c085169e", "metadata": { "collapsed": false, "editable": true @@ -3841,7 +3913,7 @@ }, { "cell_type": "markdown", - "id": "79da884b", + "id": "00df5c90", "metadata": { "editable": true }, @@ -3855,7 +3927,7 @@ }, { "cell_type": "markdown", - "id": "7e0850c9", + "id": "8d9b63c1", "metadata": { "editable": true }, @@ -3869,7 +3941,7 @@ }, { "cell_type": "markdown", - "id": "cfccb8dd", + "id": "e94b6669", "metadata": { "editable": true }, @@ -3886,7 +3958,7 @@ }, { "cell_type": "markdown", - "id": "8c21dd0f", + "id": "5f3aedf1", "metadata": { "editable": true }, @@ -3898,7 +3970,7 @@ }, { "cell_type": "markdown", - "id": "edc57a74", + "id": "02b822f4", "metadata": { "editable": true }, @@ -3908,7 +3980,7 @@ }, { "cell_type": "markdown", - "id": "2719f23e", + "id": "768c52c0", "metadata": { "editable": true }, @@ -3927,7 +3999,7 @@ }, { "cell_type": "markdown", - "id": "3b430bdd", + "id": "760347a3", "metadata": { "editable": true }, @@ -3939,7 +4011,7 @@ }, { "cell_type": "markdown", - "id": "c26279f7", + "id": "79d2eb57", "metadata": { "editable": true }, @@ -3949,7 +4021,7 @@ }, { "cell_type": "markdown", - "id": "f1817c6e", + "id": "0a0ed231", "metadata": { "editable": true }, @@ -3961,7 +4033,7 @@ }, { "cell_type": "markdown", - "id": "ed4f07c8", + "id": "fa3584c7", "metadata": { "editable": true }, @@ -3971,7 +4043,7 @@ }, { "cell_type": "markdown", - "id": "ca96383c", + "id": "21d47f8b", "metadata": { "editable": true }, @@ -3992,7 +4064,7 @@ }, { "cell_type": "markdown", - "id": "3bca4a6c", + "id": "db2fa210", "metadata": { "editable": true }, @@ -4004,7 +4076,7 @@ }, { "cell_type": "markdown", - "id": "430ada3d", + "id": "db635fe9", "metadata": { "editable": true }, @@ -4026,7 +4098,7 @@ }, { "cell_type": "markdown", - "id": "eac31eef", + "id": "3c265a28", "metadata": { "editable": true }, @@ -4044,7 +4116,7 @@ }, { "cell_type": "markdown", - "id": "c106556c", + "id": "d52cb01f", "metadata": { "editable": true }, @@ -4056,7 +4128,7 @@ }, { "cell_type": "markdown", - "id": "70a8aaa8", + "id": "be24ba23", "metadata": { "editable": true }, @@ -4073,7 +4145,7 @@ }, { "cell_type": "markdown", - "id": "df02f1d7", + "id": "091563ed", "metadata": { "editable": true }, @@ -4091,7 +4163,7 @@ }, { "cell_type": "markdown", - "id": "99c787d5", + "id": "758e86fd", "metadata": { "editable": true }, @@ -4111,7 +4183,7 @@ }, { "cell_type": "markdown", - "id": "dddf965b", + "id": "88073363", "metadata": { "editable": true }, @@ -4123,7 +4195,7 @@ }, { "cell_type": "markdown", - "id": "43574fea", + "id": "c1223aab", "metadata": { "editable": true }, @@ -4138,7 +4210,7 @@ }, { "cell_type": "markdown", - "id": "e5b4e346", + "id": "9ee0fee9", "metadata": { "editable": true }, @@ -4150,7 +4222,7 @@ }, { "cell_type": "markdown", - "id": "4e864163", + "id": "6bf8c57e", "metadata": { "editable": true }, @@ -4164,7 +4236,7 @@ }, { "cell_type": "markdown", - "id": "30aed83c", + "id": "c358063b", "metadata": { "editable": true }, @@ -4174,7 +4246,7 @@ }, { "cell_type": "markdown", - "id": "8fd080db", + "id": "6cbc958c", "metadata": { "editable": true }, @@ -4191,7 +4263,7 @@ }, { "cell_type": "markdown", - "id": "45fb9f75", + "id": "bcbd0cae", "metadata": { "editable": true }, @@ -4208,7 +4280,7 @@ }, { "cell_type": "markdown", - "id": "fbc69ab9", + "id": "27cf3c73", "metadata": { "editable": true }, @@ -4222,7 +4294,7 @@ }, { "cell_type": "markdown", - "id": "ba1ca35d", + "id": "c6e6c809", "metadata": { "editable": true }, @@ -4240,7 +4312,7 @@ }, { "cell_type": "markdown", - "id": "cc3cf402", + "id": "160f179e", "metadata": { "editable": true }, @@ -4252,7 +4324,7 @@ }, { "cell_type": "markdown", - "id": "2f1a410d", + "id": "438a8fe4", "metadata": { "editable": true }, @@ -4266,7 +4338,7 @@ }, { "cell_type": "markdown", - "id": "80472a56", + "id": "602f562a", "metadata": { "editable": true }, @@ -4282,7 +4354,7 @@ }, { "cell_type": "markdown", - "id": "83418e47", + "id": "12c01bc2", "metadata": { "editable": true }, @@ -4301,7 +4373,7 @@ }, { "cell_type": "markdown", - "id": "cc4a0a79", + "id": "8468374e", "metadata": { "editable": true }, @@ -4318,7 +4390,7 @@ }, { "cell_type": "markdown", - "id": "b4b7191e", + "id": "1ef22e88", "metadata": { "editable": true }, @@ -4343,7 +4415,7 @@ }, { "cell_type": "markdown", - "id": "f6465d64", + "id": "426a8f94", "metadata": { "editable": true }, @@ -4361,7 +4433,7 @@ }, { "cell_type": "markdown", - "id": "1db127cb", + "id": "ec419261", "metadata": { "editable": true }, @@ -4382,7 +4454,7 @@ }, { "cell_type": "markdown", - "id": "0bc54ea4", + "id": "9c189aa8", "metadata": { "editable": true }, @@ -4400,29 +4472,21 @@ }, { "cell_type": "markdown", - "id": "2be87c86", + "id": "ffb28301", "metadata": { "editable": true }, "source": [ "## Mathematics of diffusion models\n", "\n", - "Let us go back our discussions of the variational autoencoders from\n", - "last week, see\n", - ". As\n", - "a first attempt at understanding diffusion models, we can think of\n", - "these as stacked VAEs, or better, recursive VAEs.\n", - "\n", - "Let us try to see why. As an intermediate step, we consider so-called\n", - "hierarchical VAEs, which can be seen as a generalization of VAEs that\n", - "include multiple hierarchies of latent spaces.\n", + "**Note**: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at . \n", "\n", - "**Note**: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at ." + "But first VAEs as an intermediate step." ] }, { "cell_type": "markdown", - "id": "1549b4b2", + "id": "bb277cc9", "metadata": { "editable": true }, @@ -4442,7 +4506,7 @@ }, { "cell_type": "markdown", - "id": "fcc12688", + "id": "f7181b35", "metadata": { "editable": true }, @@ -4455,7 +4519,7 @@ }, { "cell_type": "markdown", - "id": "585f40c3", + "id": "7a49e79b", "metadata": { "editable": true }, @@ -4470,7 +4534,7 @@ }, { "cell_type": "markdown", - "id": "f3814fcc", + "id": "92a0b26e", "metadata": { "editable": true }, @@ -4491,7 +4555,7 @@ }, { "cell_type": "markdown", - "id": "5c7e977b", + "id": "f082e462", "metadata": { "editable": true }, @@ -4503,7 +4567,7 @@ }, { "cell_type": "markdown", - "id": "d17646c2", + "id": "a086a9e0", "metadata": { "editable": true }, @@ -4518,7 +4582,7 @@ }, { "cell_type": "markdown", - "id": "61a5ac10", + "id": "8819e64d", "metadata": { "editable": true }, @@ -4528,7 +4592,7 @@ }, { "cell_type": "markdown", - "id": "6334546c", + "id": "39b41989", "metadata": { "editable": true }, @@ -4548,7 +4612,7 @@ }, { "cell_type": "markdown", - "id": "722ec75b", + "id": "0073275c", "metadata": { "editable": true }, @@ -4562,7 +4626,7 @@ }, { "cell_type": "markdown", - "id": "85378e6c", + "id": "794774c3", "metadata": { "editable": true }, @@ -4578,7 +4642,7 @@ }, { "cell_type": "markdown", - "id": "f324b118", + "id": "76389ef8", "metadata": { "editable": true }, @@ -4600,7 +4664,7 @@ }, { "cell_type": "markdown", - "id": "51754b7f", + "id": "20cd18dc", "metadata": { "editable": true }, @@ -4612,7 +4676,7 @@ }, { "cell_type": "markdown", - "id": "90323d28", + "id": "b7bdd9ed", "metadata": { "editable": true }, @@ -4629,7 +4693,7 @@ }, { "cell_type": "markdown", - "id": "1eb91a8d", + "id": "c2db9314", "metadata": { "editable": true }, @@ -4645,7 +4709,7 @@ }, { "cell_type": "markdown", - "id": "d3de5091", + "id": "a77ec743", "metadata": { "editable": true }, @@ -4661,7 +4725,7 @@ }, { "cell_type": "markdown", - "id": "32fe1677", + "id": "4781cb88", "metadata": { "editable": true }, @@ -4676,7 +4740,7 @@ }, { "cell_type": "markdown", - "id": "85af58a8", + "id": "aa03ad89", "metadata": { "editable": true }, @@ -4692,7 +4756,7 @@ }, { "cell_type": "markdown", - "id": "f071a8d9", + "id": "a1b5ed7c", "metadata": { "editable": true }, @@ -4713,7 +4777,7 @@ }, { "cell_type": "markdown", - "id": "e0768637", + "id": "02f43f35", "metadata": { "editable": true }, @@ -4723,7 +4787,7 @@ }, { "cell_type": "markdown", - "id": "fb3d9418", + "id": "7fa0dc3b", "metadata": { "editable": true }, @@ -4745,7 +4809,7 @@ }, { "cell_type": "markdown", - "id": "1401be05", + "id": "f78bdd91", "metadata": { "editable": true }, @@ -4755,7 +4819,7 @@ }, { "cell_type": "markdown", - "id": "d7c838e6", + "id": "71d11858", "metadata": { "editable": true }, @@ -4773,7 +4837,7 @@ }, { "cell_type": "markdown", - "id": "802891da", + "id": "187c698c", "metadata": { "editable": true }, @@ -4789,7 +4853,7 @@ }, { "cell_type": "markdown", - "id": "3f73f260", + "id": "2a8aaad9", "metadata": { "editable": true }, @@ -4801,7 +4865,7 @@ }, { "cell_type": "markdown", - "id": "e33bc0aa", + "id": "b6beab9a", "metadata": { "editable": true }, @@ -4817,7 +4881,7 @@ }, { "cell_type": "markdown", - "id": "2db1db42", + "id": "e9ee7f5e", "metadata": { "editable": true }, @@ -4840,7 +4904,7 @@ }, { "cell_type": "markdown", - "id": "ae15d708", + "id": "4ed42219", "metadata": { "editable": true }, diff --git a/doc/pub/catania/ipynb/ipynb-catania-src.tar.gz b/doc/pub/catania/ipynb/ipynb-catania-src.tar.gz index e093dda..81f8535 100644 Binary files a/doc/pub/catania/ipynb/ipynb-catania-src.tar.gz and b/doc/pub/catania/ipynb/ipynb-catania-src.tar.gz differ diff --git a/doc/pub/catania/pdf/catania.pdf b/doc/pub/catania/pdf/catania.pdf index 1ae782c..b9515ae 100644 Binary files a/doc/pub/catania/pdf/catania.pdf and b/doc/pub/catania/pdf/catania.pdf differ diff --git a/doc/src/week17/catania.do.txt b/doc/src/week17/catania.do.txt index 698ef4b..fc0ccd1 100644 --- a/doc/src/week17/catania.do.txt +++ b/doc/src/week17/catania.do.txt @@ -56,6 +56,52 @@ o Linear and logistic regression, Kernel methods, support vector machines and mo o Reinforcement Learning; Transfer Learning and more +!split +===== Extrapolations and model interpretability ===== + +When you hear phrases like _predictions and estimations_ and +_correlations and causations_, what do you think of? + +May be you think +of the difference between classifying new data points and generating +new data points. + +Or perhaps you consider that correlations represent some kind of symmetric statements like +if $A$ is correlated with $B$, then $B$ is correlated with +$A$. Causation on the other hand is directional, that is if $A$ causes $B$, $B$ does not +necessarily cause $A$. + +!split +===== Generative and discriminative models ===== + +!bblock +o Balance between tractability and flexibility +o We want to extract information about correlations, to make predictions, quantify uncertainties and express causality +o How do we represent reliably our effective degrees of freedom? +!eblock + +A teaser first, see next slides. + +!split +===== "Dilute neutron star matter from neural-network quantum states by Fore et al, Physical Review Research 5, 033062 (2023)":"https://journals.aps.org/prresearch/pdf/10.1103/PhysRevResearch.5.033062" at density $\rho=0.04$ fm$^{-3}$ ===== + +!bblock +FIGURE: [figures/nmatter.png, width=700 frac=0.9] +!eblock + + + +!split +===== The electron gas in three dimensions with $N=14$ electrons (Wigner-Seitz radius $r_s=2$ a.u.), "Gabriel Pescia, Jane Kim et al. arXiv.2305.07240,":"https://doi.org/10.48550/arXiv.2305.07240" ===== + +!bblock +FIGURE: [figures/elgasnew.png, width=700 frac=0.9] +!eblock + + + + + !split ===== What Is Generative Modeling? ===== @@ -2048,22 +2094,14 @@ function. Furthermore, since a diffusion process exists for any smooth target distribution, this method can capture data distributions of arbitrary form. - !split ===== Mathematics of diffusion models ===== -Let us go back our discussions of the variational autoencoders from -last week, see -URL:"https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week15/ipynb/week15.ipynb". As -a first attempt at understanding diffusion models, we can think of -these as stacked VAEs, or better, recursive VAEs. - -Let us try to see why. As an intermediate step, we consider so-called -hierarchical VAEs, which can be seen as a generalization of VAEs that -include multiple hierarchies of latent spaces. _Note_: Many of the derivations and figures here are inspired and borrowed from the excellent exposition of diffusion models by Calvin Luo at URL:"https://arxiv.org/abs/2208.11970". +But first VAEs as an intermediate step. + !split ===== Chains of VAEs ===== diff --git a/doc/src/week17/figures/elgasnew.png b/doc/src/week17/figures/elgasnew.png new file mode 100644 index 0000000..560d082 Binary files /dev/null and b/doc/src/week17/figures/elgasnew.png differ diff --git a/doc/src/week17/figures/nmatter.png b/doc/src/week17/figures/nmatter.png new file mode 100644 index 0000000..8915987 Binary files /dev/null and b/doc/src/week17/figures/nmatter.png differ