Bayesian Model Selection, the Marginal Likelihood, and Generalization

This repository contains experiments of the group student project for the Bayesian Machine Learning class of the MVA master 2023-2024.

Authors:

Basile Terver
Léa Khalil
Jean Dimier

We study the paper Bayesian Model Selection, the Marginal Likelihood, and Generalization by Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, and Andrew Gordon Wilson, reproduce and extend some of their experiments to other datasets.

Introduction

In this paper, the authors discuss the marginal likelihood as a model comparison tool, and fundamentally re-evaluate whether it is the right metric for predicting generalization of trained models, and learning parameters.

They discuss the strengths and weaknesses of the marginal likelihood for model selection, hypothesis testing, architecture search and hyperparameter tuning.
They show that the marginal likelihood is answering an entirely different question than the generalization question: "how well will my model generalize on unseen data?", which makes the difference between hypothesis testing and predicting generalization.
They show that optimizing the marginal likelihood can lead to overfitting and underfitting in the function space.
They revisit the connection between the marginal likelihood and the training efficiency, and show that models that train faster don't necessarily generalize better or have higher marginal likelihood.
They demonstrate how the Laplace approximation of the marginal likelihood can fail in architecture search and hyperparameter tuning of deep neural networks.
They study the conditional marginal likelihood and show that it provides a compelling alternative to the marginal likelihood for neural architecture comparison, deep kernel hyperparameter learning, and transfer learning.

In this repository we provide code for reproducing results in the paper.

Requirements

We use the Laplace package for Laplace experiments, which requires python3.8. It can be installed using pip as follows:

pip install laplace-torch

Experiments

You can reproduce the GP experiments by running the Jupyter notebooks in ./GP_experiments/.

CIFAR-10 and CIFAR-100

Careful: for code simplicity, we have hardcoded our data path at the beginning of all the scripts as data_directory = "/Data/basile-terver/__data__" but one should adapt this path to store heavy dataset files in a relevant location.

To train ResNet and CNN models and compute their Laplace marginal likelihood for CIFAR-10 and CIFAR-100 as in section 6 of the paper, navigate to ./Laplace_experiments/cifar and run the following:

python logml_<dataset>_<models>.py --decay=<weight decay parameter> \
				 --prior_structure=<the structure of the prior: scalar or layerwise> \
                 --hessian_structure=<structure of the hessian approximation: full, kron, diag> \
                 --base_lr=<optimization learning rate> \
                 --use_sgdr=<use cosine lr scheduler> \
                 --optimizehypers=<optimize hyperparameters using Laplace approximation> \
                 --hypers_lr=<learning rate for hyperparameter learning> \
                 --batchnorm=<use batchnorm instead of fixup> \
                 --chk_path=<path to save the checkpoints> \
                 --result_folder=<path to save the results>

We have set the default values of the flags of these 4 scripts so that you can simply run python logml__.py to train on the full dataset.

Then, you can rerun those scripts only on 80% of the train set by modifying inside the script the trainset and testset, and setting --chk_path="checkpoint/cifar10/subset/cnns" for cnns and --chk_path="checkpoint/cifar10/subset/resnets" for resnets.

Remark: we have trained all these models on a NVIDIA RTX A5000 with 24GB of GPU RAM. Training so many models for 250 epochs each takes about a day.

Then, once we have trained (with all or with 80% of the data) all our models, we can compute the conditional marginal likelihood, MAP Test Accuracy, BMA Test Accuracy, MAP Test Log-Likelihood and BMA Test Log-Likelihood as follows:

python logcml_<dataset>_<models>.py --prior_prec_init=<weight decay parameter> \
				 --prior_structure=<the structure of the prior: scalar or layerwise> \
                 --hessian_structure=<structure of the hessian approximation: full, kron, diag> \
                 --base_lr=<optimization learning rate> \
                 --bma_nsamples=<number of posterior samples to average over> \
                 --data_ratio=<ratio of the data to condition on> \
                 --max_iters=<number of iterations to optimize the rescaling parameter of the hessian> \
                 --partialtrain_chk_path=<path to checkpoints of models trained on a fraction of the data> \
                 --fulltrain_chk_path=<path to checkpoints of models trained on the full data> \
                 --result_folder=<path to save the results>

Remark: We had to create ./Laplace_experiments/cifar/data/cifar100_subsets.npz by ourselves as it was missing in the repository of the authors, as well as fixing other bugs in the code.

Remark: Running those scripts takes a comparable amount of time to the proper training with the logml__.py scripts as it is not parallelizable and mostly makes use of CPUs.

Once you have run all these scripts, you can reproduce the plots of the original paper by running the Laplace_experiments/plot_neural_arch_search.ipynb notebook, where the last plot is to compared with the plots of Appendix H of the original paper. We have only shown the results for CIFAR10 but the exact same functions can be used for CIFAR100 results by just adapting the files paths.

Remarks: the authors of the original paper and repo did not provide the code to reproduce their plots from Appendix H, we provide such code with this notebook. We did not have time to train resnets for other decay/prior values than $\lambda=100$. Finally, we push our model checkpoints and LML and CLML results to our repository to save the user costly training.

The below figure from the original paper is a summary of all the figures of Appendix H:

Gaussian process kernel learning

We mainly played with the jupyter notebook located in ./GP_experiments/marginal_lik_gps_rq copy.ipynb, presenting a fitting of a gaussian process defined by a Rationale-Quadratic kernel. The main ideas are to change the parameters true_lengthscale and true_noise and the boolean overestimate, which triggers the main change in the behavior of the LML vs the test likelihood, and then run the cells of the notebook.

Deep kernel learning

In order to launch the deep kernel learning experiments, navigate to ./DKL_experiments/. We have reused the previous deep kernel learning files to train a similar network on different datasets from the UCI database. In order to do so, we have directly downloaded from the Bayesian_benchmarks repository the data and path files, which have been added to the folder. In order to have a simple framework to rerun the experiments, we have stacked all of the experiments code in the DKL_experiments/DKL_exp.ipynb notebook. We have first defined the model and a main function, and in the last cell of the notebook we run the experiments with the desired datasets, number of training points and cut-off values (everything is modifiable). Carefully check when running the experiments that the models are indeed stored in the /saved-output/ folder. When running the get_regression_data('desired dataset') function, the data should be downloaded locally in a ./data_2/ subfolder of /DKL_experiments/.

Plots can be reproduced by running the 'DKL_plots.ipynb' notebook. For simplicity reasons, we have hard-encoded the values of n_train,m,dataset which we used in the previous notebook in the experiment and which determine the naming convention of the downloaded models. If you wish to run the experiments with different parameters and trace the subsequent plots, make sure to also modify these parameters in the first cell of the DKL_plots.ipynb notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
DKL_experiments		DKL_experiments
GP_experiments		GP_experiments
Laplace_experiments		Laplace_experiments
demos		demos
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Model Selection, the Marginal Likelihood, and Generalization

Introduction

Requirements

Experiments

CIFAR-10 and CIFAR-100

Gaussian process kernel learning

Deep kernel learning

About

Releases

Packages

Languages

License

Basile-Terv/Bayesian_model_comparison

Folders and files

Latest commit

History

Repository files navigation

Bayesian Model Selection, the Marginal Likelihood, and Generalization

Introduction

Requirements

Experiments

CIFAR-10 and CIFAR-100

Gaussian process kernel learning

Deep kernel learning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages