This is the code-repo for the NeurIPS 2023 paper "Implicit Variational Inference for High-Dimensional Posteriors". arxiv
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors in high-dimensional spaces. Our approach advances inference using implicit distributions by introducing novel bounds that come about by locally linearising the neural sampler. This is distinct from existing methods that rely on additional discriminator networks and unstable adversarial objectives. Furthermore, we present a new sampler architecture that, for the first time, enables implicit distributions over millions of latent variables, addressing computational concerns by using differentiable numerical approximations. Our empirical analysis indicates our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network's performance but notoriously challenging to achieve. To the best of our knowledge, no other method has been shown to accomplish this task for such large models. Through experiments in downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit approximation.
In one sentence our approach is about training Bayesian neural networks with implicit VI leveraging GAN style generators to generate weights and biases of the neural network.
The approach in this work uses a hypernetwork/generator/VAE decoder style network to parameterise a variational approximation and can be used with other latent variable models like VAEs.
We provide the code for the experiments in the paper. The code is written in Python 3.9 and Pytorch 1.12.1.
Please refers to the requirements.txt
file for checking packages and versions.
The BNNs need to be made with reparameterisable layers. We provide a set of reparameterisable layers in the
bnn_src/layers.py
file. The layers are based on the reparameterisation trick where the generator produces a
vector of parameters for the BNN layers. The parameters are samples from the approximate posterior distribution
parameterised by the generator and used for calculating losses.
- Clone the repo.
- Decide a generator architecture that takes a base variable and produces a vector of parameters for the BNN layers.
- For smaller BNNs, MLP style generators should suffice.
- For any BNN over 50K parameters, we recommend using the Matrix Multiplication style generator. See
bnn_src/models.py
for an example.
- Compose BNN using the reparameterisable layers provided in
bnn_src/layers.py
. The BNN requires a generator for intialisation. - Decide a base distribution for the base variable. We recommend using a Gaussian distribution.
- Decide a prior distribution for the BNN parameters. We recommend using a Gaussian distribution.
- Compose components from steps 1, 2, 3, and 4 into the
ImplicitBNNs
class inbnn_src/imp_bnn.py
which also provides loss functions for training using the approximate ELBO (requires specifying a likelihood for the BNN) and a prediction function for making Bayesian predictions. - To see this flow in action please refer to
implicit_train.py
.
Additionally please check toy_posterior_testing.ipynb
notebook for a simple toy data regression example which follows
the same workflow as above.
Note: If you design your own generator architectures they have to be able to produce all the parameters of the BNN as that is, currently, the only supported way of training BNNs with implicit posteriors.
The advice below is recommended if the BNN contains over 50K parameters and has been tested for up to 30M parameters.
Use CorreMMGenerator
class in bnn_src/models.py
for large scale BNNs. This class uses a matrix multiplication
generator for generating very high dimensional vectors of parameters of a BNN. Please refer to gen_arch_config
folder to see the architecture of the generators used in the paper.
Please make sure to disable the --accurate
flag, like in implicit_train.py
which passes this to ImplicitBNNs
elbo
loss. This will make sure to use the smallest singular value numerical approximation.
We provide all the generator architectures used in the paper in gen_arch_config/
as .yaml
files. These files contain
architecture description for the novel Matrix Multiplication generator/hypernetwork proposed in the paper. Usage described below:
with open(f'gen_arch_config/config_file_name.yaml', 'r') as cnfg_f:
config = yaml.full_load(cnfg_f)
input_size, output_size, hidden_units, num_hidden_layers = config
input_size
is the size of the matrix sampled from the base distribution, output_size
needs to specified for each of the
matrix multiplication subnetworks (in matrix dimensions). One can either specify this to such that each matrix subnetwork towards
the end generates parameters for one layer of the BNN or they can all generate the same sized matrix. hidden_units
are the dimensions
of the matrix multiplication hidden layer in the individual subnetworks and num_hidden_layers
specify their number, 1 or 2 is usually
fine as a higher number results in longer runtimes.
Note During experimentation we noticed that down-weighting the gradients from the standard normal prior in the ELBO is
helpful during training. Specifically, this allows a generator with fewer parameters to achieve good performance and thus
the ELBO loss functions takes two different down-weighing arguments namely jacobi_down_weight
and prob_down_weight
, the
former is multiplied with the entropy term and the latter with the prior log probability.
ALSO NOTE The above down-weighing is not the same as cold-posteriors!! We do not scale down the entire KL term as many works do,rather we want all the gradient contribution from the entropy term to be able to capture a multimodal posterior and we only down-weigh the prior which is often naively chosen in BDL.
We provide functions used for generating results presented in the paper in evaulate_models.py
and core functions
in bnn_src/metrics.py
. These experiments are take a lot of inspiration from the Laplace Redux work by Erik
Daxberger et. al. in NeurIPS 2020 to standardise UQ testing across different methods.
- Support for DNN to BNN api like bayesian-torch library available here - https://github.com/IntelLabs/bayesian-torch
- Sub-stochastic BNN implementation with implicit approximations for the posterior. This would enable unconventional training of BNNs with priors over selected parameters of the BNN (e.g. last layer).