This repo provides the code accompanying two papers:
A. Goliński*, F. Wood, T. Rainforth*, Amortized Monte Carlo Integration, ICML, 2019
T. Rainforth*, A. Goliński*, F. Wood, S. Zaidi, Target–Aware Bayesian Inference: How to Beat Optimal Conventional Estimators,
JMLR, 2020
The default config files included in this code (particularly the data set and batch size settings) requires a GPU with ~8GB RAM for training the neural networks and generating samples required to reproduce the figures from the papers. The samples generated are stored onto the hard drive what requires about 5.1GB of disk space in total for all the experiments.
Make sure to set the appropriate version of cudatoolkit
for your environment below
conda create -n amci python=3.7
conda activate amci
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=X.X -c pytorch
conda install --file requirements_conda.txt
pip install -r requirements_pip.txt
cd
into the repo, thencd src; export PYTHONPATH=$(pwd); cd amci
- Train
q1
andq2
networks by runningAfter the training is finished updatepython train.py with tail_integral/config_1d.yaml q1_or_q2=q1 --force python train.py with tail_integral/config_1d.yaml q1_or_q2=q2 --force python train.py with tail_integral/config_5d.yaml q1_or_q2=q1 --force python train.py with tail_integral/config_5d.yaml q1_or_q2=q2 --force python train.py with cancer/config.yaml q1_or_q2=q1 factor=c0 --force python train.py with cancer/config.yaml q1_or_q2=q1 factor=eps --force python train.py with cancer/config.yaml q1_or_q2=q2 factor=c0 --force python train.py with cancer/config.yaml q1_or_q2=q2 factor=eps --force
checkpoint_q1
andcheckpoint_q2
fields in the respectiveconfig*.yaml
files with the paths of the checkpoints generated inlogs
directory. For thecancer
example you also have to update thecheckpoint_q1/q2_eps/c0
fields (aside fromcheckpoint_q1
andcheckpoint_q2
). - To generate samples from the learned proposals that will be combined into estimates in the next step run
The script saves the generated samples onto the hard drive and hence consumes a considerable amount of data, about 5.1GB in total for all the experiments.
python generate_samples.py tail_integral/config_1d.yaml python generate_samples.py tail_integral/config_5d.yaml python generate_samples.py cancer/config.yaml
- To reproduce the ReMSE figures from the paper run
The pdf figures are saved in the respective checkpoint folders.
python create_remse_figure_from_samples.py tail_integral/config_1d.yaml python create_remse_figure_from_samples.py tail_integral/config_5d.yaml python create_remse_figure_from_samples.py cancer/config.yaml
The ground truth estimates included for the tail_integral_5d
and cancer
experiments were generated using 1e10
importance samples with prior as the proposal.
If you wish to regenerate them run
python ground_truth.py tail_integral/config_5d.yaml
python ground_truth.py cancer/config.yaml
For the tail_integral_1d
there is no need to estimate the ground truth because it is obtained analytically.
Many thanks to the developers and contributors of github.com/ikostrikov/pytorch-flows and github.com/facebookresearch/higher.
If anything is unclear, doesn't work, or you just have questions please create an issue, I'll try to get back to you.
@inproceedings{golinski2019amortized,
title = {{A}mortized {M}onte {C}arlo {I}ntegration},
author = {Goli{\'n}ski, Adam and Wood, Frank and Rainforth, Tom},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2019},
}
@article{rainforth2020target,
author = {Tom Rainforth and Adam Goli{\'n}ski and Frank Wood and Sheheryar Zaidi},
title = {{T}arget--{A}ware {B}ayesian {I}nference: {H}ow to Beat Optimal Conventional Estimators},
journal = {Journal of Machine Learning Research (JMLR)},
year = {2020},
volume = {21},
number = {88},
pages = {1-54},
}