Course given by Simon Leglaive at CentraleSupélec.
Bayesian modeling, inference and prediction techniques have become commonplace in machine learning. Bayesian models are used in data analysis to describe, through latent factors, the generative process of complex data (e.g. medical images, audio signals, text documents). The discovery of these latent or hidden variables from observations is based on the notion of posterior probability distribution, the calculation of which corresponds to the Bayesian inference step.
The Bayesian machine learning approach has the advantage of being interpretable, and it makes it easy to include expert knowledge through the definition of priors on the latent variables of interest. In addition, it naturally offers uncertainty information about the prediction, which can be particularly important in certain application contexts, such as medical diagnosis or autonomous driving for example.
At the end of the course, you are expected to:
- know when it is useful or necessary to use a Bayesian machine learning approach;
- have a view of the main approaches in Bayesian modeling and exact or approximate inference;
- know how to identify and derive a Bayesian inference algorithm from the definition of a model;
- be able to implement standard supervised or unsupervised Bayesian learning methods.
You are expected to be familiar with basic concepts of probabilities, statistics and machine learning. The 1st-year course "statistics and learning" at CentraleSupélec provides all these requirements.
We will have a session dedicated to the basics of statistical learning, so the most important is that you revise probabilities if you feel like you need to. To do so, you can read Chapter 6 "Probability and distribution" of Mathematics for Machine Learning, by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, published by Cambridge University Press.
Most of the concepts that we will see in this course are discussed in the machine learning reference book Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006, which is moreover freely available online.
Other useful references are:
- Mathematics for Machine Learning, by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Cambridge University Press, 2020. (freely available online)
- Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy, MIT Press, 2012. (available at the library).
Session 1 | Lecture | Fundamentals of Bayesian modeling and inference |
Session 2 | Lecture | Fundamentals of machine learning |
Session 3 | Lecture | Bayesian networks and inference in latent variable models |
Session 4 | Practical | Gaussian mixture model |
Session 5 | Lecture | Factor Analysis |
Session 6 | Lecture | Variational inference |
Session 7 | Practical | Bayesian linear regression |
Session 8 | Lecture | Markov Chain Monte Carlo |
Session 9 | Practical | Sparse Bayesian linear regression |
Session 10 | Lecture | Deep generative models |
Session 11 | Lecture | Revision and other activities |
Key concepts you should be familiar with at the end of this lecture:
- Latent and observed variable
- Bayesian modeling and inference
- Prior, likelihood, marginal likelihood and posterior
- Decision, posterior expected loss
- Predictive prior, predictive posterior
- Gaussian model with latent mean or variance
- Conjugate, non-informative and hierarchical priors
Material:
Key concepts you should be familiar with at the end of this lecture:
- Supervised learning
- Empirical risk minimization
- Underfitting and overfitting
- Bias-variance trade-off
- Maximum likelihood, maximum a posteriori
- Multinomial logistic regression
Material:
Key concepts you should be familiar with at the end of this lecture:
- Bayesian network (or directed probabilistic graphical model)
- Conditional independence
- D-separation
- Markov blanket
- Generative model with latent variables
- Evidence lower-bound
- Expectation-maximization algorithm
Material:
This practical session is about the Gaussian mixture model, a generative model used to perform clustering, in an unsupervised fashion.
Material:
Key concepts you should be familiar with at the end of this lecture:
- Factor analysis generative model
- Derivation of the posterior
- Derivation of the marginal likelihood
- Properties of the multivariate Gaussian distribution
- Derivation of an EM algorithm (with continuous latent variables, contrary to the previous session on GMMs) for parameters estimation
Material:
Key concepts you should be familiar with at the end of this lecture:
- The problem of intractable posterior
- Kullback-Leibler divergence
- Variational inference
- Mean-field approximation
Material:
- Slides
- Notebook - Variational inference for the Gaussian model with latent parameters
- Handwritten solution to exercices
We already discussed about linear regression (polynomial regression) in the second lecture, and we saw that with a standard maximum likelihood approach, we have to carefully choose the degree of the polynomial model in order not to overfit the training data. In Bayesian linear regression, a prior distribution is considered for the weights, which acts as a regularizer and prevents overfitting. Moreover, this Bayesian approach to linear regression naturally provides a measure of uncertainty along with the prediction.
Material:
Key concepts you should be familiar with at the end of this lecture:
- The Monte Carlo method to approximate expectations
- Sampling methods (inverse transform sampling, change of variable, rejection sampling, importance sampling)
- Definition of Markov chains
- Markov chain Monte Carlo methods
Material:
This practical session is a follow-up of the previous one on Bayesian linear regression. We complexify the prior for the linear regression weights so that exact posterior inference is now intractable and a variational approach has to be developed.
Material:
Key concepts you should be familiar with at the end of this lecture:
- The problem of (deep) generative modeling
- Generative model of the variational autoencoder (VAE), a non-linear generalization of factor analysis
- VAE inference model
- VAE training procedure
- Application of VAEs for MNIST image generation
Material:
Activity 1: Q&A session
Activity 2: You will find exercises here. You will also find exercices (that were left as homeworks) in the slides of the different lectures.
Activity 3: Reading about sequential data processing with latent-variable models.
"State-space models (SSM) provide a general and flexible methodology for sequential data modelling. They were first introduced in the 1960s, with the seminal work of Kalman and were soon used in the Apollo Project to estimate the trajectory of the spaceships that were bringing men to the moon. Since then, they have become a standard tool for time series analysis in many areas well beyond aerospace engineering. In the machine learning community in particular, they are used as generative models for sequential data, for predictive modelling, state inference and representation learning". Quote from Marco Fraccaro's Ph.D Thesis entitled "Deep Latent Variable Models for Sequential Data" and defended at Technical University of Denmark in 2018.
The Kalman filter and smoother are used to compute the posterior distribution of a sequence of latent vectors (called the states) given an observed sequence of measurement. In this video, a Kalman filter is used to track the latent position of multiple persons over time. The latent state variable in this case is continuous.
When the latent state variable is discrete, the state-space model is called a hidden Markov model (HMM). HMMs were very popular for automatic speech recognition, before the deep learning era. The latent state variable in this context is discrete and corresponds to a phoneme (an elementary unit of speech sound that allows us to distinguish one word from another in a particular language), while the observations are acoustic speech features computed from the audio signal.
Chapter 3 of Marco Fraccaro's Ph.D Thesis available here gives a very nice introduction to state-space models and Kalman filtering. To go a bit further, Chapter 4 introduces deep latent variable models for sequential data processing, using the framework of variational autoencoders.
The slides are created using Remark, "A simple, in-browser, Markdown-driven slideshow tool". The template is modified from Marc Lelarge's template used in his (very nice) deep learning course.
I did my best to clearly acknowledge the authors of the ressources that I could have been using to build this course. If you find any missing reference, please contact me.
If you want to reuse some of the materials in this repository, please also indicate where you took it.
If you are not one of my student and you would like to have the solution to the practical works, you can contact me.
Email address: [email protected]
GNU Affero General Public License (version 3), see LICENSE.txt
.