-
Notifications
You must be signed in to change notification settings - Fork 40
Home
This FAQ is curated by Luigi Acerbi, and in constant expansion. This document was originally started for the MATLAB version of VBMC, but most of the answers also apply to the Python version (PyVBMC).
For a tutorial with detailed examples, see:
- vbmc_examples.m for MATLAB;
- The Jupyter notebook examples for Python.
If you have questions not covered here, please feel free to ask in the lab Discussions forum.
- General
- Installation
-
Input arguments (target function:
fun
)- What is the target function?
- I do not have a likelihood function, but another loss function. Can I still use VBMC?
- Wait, do I need a prior?
- How do I specify the prior to VBMC?
- My target function requires additional data/inputs. How do I pass them to VBMC?
- Does VBMC support inference with noisy (that is, stochastic) target functions?
-
Input arguments (domain:
x0
,LB
,UB
,PLB
,PUB
)- How do I choose the starting point
X0
? - How do I choose
LB
andUB
? - Can I have parameters bounded only on one side?
- Can I set
LB = UB
for some variable to fix it to a given value? - How do I choose
PLB
andPUB
? - How do I prevent VBMC from evaluating certain inputs or regions of input space?
- Does VBMC support inference with integer parameters?
- How do I choose the starting point
- Output arguments
-
Noisy target function
- What's a noisy target function?
- Does VBMC automatically detect that the target function is noisy?
- Does VBMC automatically infer the amount of noise in the target function?
- Can I use any technique to estimate a noisy log-likelihood?
- How do I estimate the standard deviation of the noisy log-likelihood?
- How large can the noise be to perform successful inferences with VBMC?
- Wait, can I just remove stochasticity by fixing the random seed at each function call?
- Display
-
Troubleshooting
- How can VBMC fail, how do I find out, and how do I fix it?
- VBMC warned me that the
Returned variational solution may have not converged.
What should I do? - VBMC crashes saying that
The returned function value must be a finite real-valued scalar.
What do I do? - I have been running VBMC from the same starting point, but I get different results each time. Is something wrong?
- Miscellanea
-
I would recommend VBMC for problems in which:
- you can calculate an estimate of the target (log) likelihood function, which can be deterministic or noisy (see below);
- the target likelihood function is at least moderately expensive to compute (at least ~0.1 s or more per function evaluation);
- the gradient may be unavailable;
- the number of input parameters is up to about
D = 10
(maybe up to20
, but not more); - the target posterior is continuous and reasonably smooth, such that it can be approximated by a Gaussian process (GP) with a smooth kernel. This is generally difficult to know a priori, but you can look for telltale signs that the VBMC approximation is failing.
If your likelihood function is fully analytical — or, more generally, fast to evaluate —, VBMC is most likely not the best tool for your problem (see below).
-
If the likelihood function is smooth and analytical (and fast to compute), you should consider estimating the posterior via Markov Chain Monte Carlo, for example using probabilistic programming languages such as Stan or PyMC3.
If the likelihood function is computationally expensive and non-smooth (or it produces a pathological posterior), then, well, tough luck. In this case, you may see if revising your model, for example by changing the parameterization, produces a posterior landscape which is more compatible with the approximations used by VBMC.
Alternatively, you may give up on full (approximate) Bayesian inference, and resort instead to maximum-likelihood (or maximum-a-posteriori) estimation, using an efficient optimization algorithm such as Bayesian Adaptive Direct Search (BADS).
-
Here.
-
At the moment, VBMC requires MATLAB's Optimization Toolbox™ and possibly the Statistics and Machine Learning Toolbox™. If I receive enough requests, I might remove dependency from these toolboxes in future versions of VBMC.
-
VBMC has been tested successfully on MATLAB versions starting from R2014a onwards. It might be able to run on earlier versions (in which case, please do let me know).
-
Sure. The VBMC installation should be pretty straightforward, so write me in detail which problem you are having.
-
The target function in VBMC is the (unnormalized) log posterior or log joint probability density, usually specified via a function handle
fun
, the first input argument tovbmc
.For a typical model-fitting problem,
fun
is a function that computes the (unnormalized) log posterior of an input parameter vectorx
, for a given dataset and model. The log posterior is computed as log likelihood plus log prior.If you are unsure, see the following questions for more information about the likelihood function and the prior.
-
For the results of VBMC to make sense, you really need a likelihood (and a prior; see below).
However, sometimes you might already almost have a likelihood, and you just do not know about it. For example, suppose you are fitting a model via least squares, such that your target is a quadratic loss function defined as:
loss = @(data_x,data_y,x) sum((f(data_x,x) - data_y).^2);
where
f
returns an array of model predictions for model parametersx
and datadata_x
, anddata_y
is an array of observed responses. You can transform the above equation in a log likelihood as:function ll = log_likelihood(x,data_x,data_y) sigma2 = exp(2*x(end)); % Extract sigma^2 from log sigma parameterization N = size(data_y,1); % Number of trials (rows of y) x_orig = x(1:end-1); % Original parameter vector (without sigma) ll = -0.5*sum((f(data_x,x_orig) - data_y).^2)/sigma2 - 0.5*N*log(2*pi*sigma2); end
where we have added
log(sigma)
as an additional model parameter at the end of the parameter vectorx_orig
(the logarithmic representation is convenient to ensure thatsigma
is positive).sigma
is the standard deviation of the observed valuesdata_y
with respect to the model predictionsf(data_x,x)
(that is, the standard deviation of the residuals). The last line defines the log likelihood of a series of observations corrupted by Gaussian noise. Note that maximizinglog_likelihood
with respect tox
is equivalent to minimizing the quadratic loss defined above. -
Yes, you do. For the marginal likelihood (model evidence) to make sense as a model comparison metric, you should use a proper prior over the parameters (that is, a prior that integrates to 1). Horrible things happen if you compute the marginal likelihood with an improper or unnormalized prior. Since VBMC computes the ELBO, a lower bound to the (log) marginal likelihood, you need a prior.
Also, a prior will come in handy to set plausible bounds for the parameters (see below).
A uniform bounded prior over parameters is okay (although not ideal), as long as you correctly normalize it.
-
At the moment, VBMC does not have explicit knowledge of the prior. You pass to VBMC the target function
fun
, which is a black-box handle to the log joint distribution, that is log likelihood plus log prior.For this reason, VBMC does not automatically normalize the prior — that is up to you. For example, for a uniform prior you would have:
fun = @(x) log_likelihood(x) - sum(log(UB - LB));
where
log_likelihood
is the handle to the log likelihood function, and the second term represents the log prior (uniform over the range). -
Suppose that your function takes two inputs,
fun(x,data)
.The first solution consists of defining a new function handle
funwdata = @(x) fun(x,data);
where
data
has been defined before in the code. Now you can pass to VBMCfunwdata
, which takes a single input. Note that a copy ofdata
is embedded infunwdata
. If you subsequently make any change to the originaldata
, you will need to redefine the function handle if you want the changes to reflect in the inference.Alternatively, VBMC can take additional inputs after
OPTIONS
(the seventh argument). Any additional input is passed to the target function. In this case, you would writevbmc(fun,X0,LB,UB,PLB,PUB,OPTIONS,data)
-
Yes, as of v1.0 (June 2020). See this section for more information.
-
First of all, keep in mind that you should run VBMC from different starting points, as a sanity check for your solutions.
A good starting point belongs to regions of high posterior probability density (even better, to typical regions). Generally, a point in the vicinity of the mode of the posterior would represent a reasonable starting point (this is not necessarily true in high dimension, but VBMC only operates in relatively low dimension). You can find such points by running a few iterations of an optimizer first.
Note that a starting point cannot be on the bounds
LB
orUB
. You can avoid this issue by constraining the preliminary optimization to be inside the plausible box, that is withinPLB
andPUB
(see also below).For example, you can use Bayesian Adaptive Direct Search (BADS) to perform the preliminary optimization inside the plausible box, as follows:
% Pick starting point via a preliminary optimization with BADS nvars = numel(PLB); X0 = PLB + rand(1,nvars) .* (PUB - PLB); opt_options = bads('defaults'); opt_options.MaxFunEvals = 50*D; % Quick optimization X0 = bads(@(x) -fun(x),X0,PLB,PUB,[],[],[],opt_options);
-
LB
andUB
are the hard bounds of the parameter space of the model of interest. These are strict inequalities, in that for any coordinate dimensioni
,LB(i) < X(i) < UB(i)
.You can set the hard bounds to the support of the posterior probability density (that is, the region with nonzero probability density). For each each coordinate dimension, the choice is between:
- unconstrained coordinates, specified with
LB(i) = -Inf
andUB(i) = Inf
; or - bounded coordinates, in which both
LB(i)
andUB(i)
are finite (withLB(i) < UB(i)
).
- unconstrained coordinates, specified with
-
Half-bounded parameters are coordinate dimensions for which one bound is finite and the other is infinite. At the moment, VBMC does not support half-bounded parameters.
If you need to set a half-bound constraint, fix the 'unbounded' half to some large but meaningful value (e.g., the 0.01th or 99.9th percentile of the prior, for, respectively, an 'unbounded' lower/upper bound). Do not set the unbounded half to some impossibly large value, with respect to the scale of your problem, otherwise VBMC is likely to fail.
-
No — at the moment,
LB
andUB
need to be distinct. This feature might be included in future versions of VBMC. -
PLB
andPUB
are the plausible (or reasonable) bounds, which denote a region of high posterior probability mass. Plausible bounds are used by VBMC to draw training points to construct the starting approximation of the posterior, and to initialize various hyperparameters of the algorithm, such as typical scales for the parameters.The plausible bounds need to be finite, and distinct from the hard bounds
LB
andUB
.In the absence of other information, I would recommend to set the plausible range as the ~68.26% percentile interval of the prior (that is, set
PLB
to the 15.87% percentile andPUB
to the 84.13% percentile).For example, for a Gaussian prior with a given mean
mu
and standard deviationSD
, this would amount toPLB = norminv(0.1587,mu,SD); PUB = norminv(0.8413,mu,SD);
which is about equivalent to
PLB = mu - SD; PUB = mu + SD;
Somewhat counter-intuitively, if in doubt about the location of a high posterior probability region, it is better to choose narrower plausible bounds than wide ones. The rationale is that the Gaussian process approximation built by VBMC might fail if the algorithm ends up sampling from a region with extremely low posterior probability, and this is more likely to happen if the plausible bounds are arbitrarily large.
-
If these regions can be identified by coordinate-wise ranges, use
LB
andUB
. Otherwise, use the prior over parameters to gradually and continuously reduce the probability for those regions (but never to zero).Do not have
FUN(X)
returnInf
,NaN
or0
for invalid inputs. VBMC would simply crash (see this question).Absolutely do NOT have
FUN(X)
return an arbitrarily small number (such asrealmin
) to enforce VBMC to avoid certain regions. This strategy may seem innocent enough, but in fact it may cripple VBMC by making its model of the target function nonsensical. -
No, VBMC does not support integer parameters (that is, variables forced to be integers — or, more in general, constrained to discrete values). Be aware that simple workarounds might break down the VBMC approximation, in particular the assumption that the underlying target function (the log posterior) is continuous and reasonably smooth.
-
vp
is a MATLAB structure that represents the variational posterior — an approximation of the true posterior — computed by the VBMC inference algorithm.You can query and manipulate the variational posterior with the following functions:
-
vbmc_rnd
: generate random samples from the variational posterior; -
vbmc_pdf
: evaluate the probability density of the variational posterior at a given point; -
vbmc_moments
: compute mean and covariance matrix of the variational posterior; -
vbmc_mode
: compute the mode of the variational posterior (that is, the maximum-a-posteriori or MAP estimate); -
vbmc_kldiv
: compute Kullback-Leibler divergence between two variational posteriors; -
vbmc_mtv
: compute marginal total variation distance between two variational posteriors; -
vbmc_isavp
: check whether the input is a variational posterior struct from VBMC.
Also, if you have a struct or cell array which stores multiple variational posteriors (e.g.,
vp{1}
,vp{2}
,...), you can callvbmc_diagnostics
to run some convergence tests.You can see several usage examples in the VBMC tutorial here.
-
-
elbo
is the variational Evidence Lower BOund estimated by VBMC, that is an approximation of a lower bound on the log marginal likelihood (or log model evidence). Note that there are two levels of approximation:- variational inference computes the ELBO, which is a lower bound on the log marginal likelihood;
- VBMC estimates the ELBO using a GP surrogate of the true posterior and Bayesian quadrature.
elbo_sd
is the uncertainty (standard deviation) on the second part of the above approximation. In variational inference, there is no direct way to know how far the ELBO is from the true log marginal likelihood (which would be the error of the first part).You can use
elbo
as a metric for model selection, withelbo_sd
providing one of several diagnostics of convergence. You should not trust solutions with a fairly large uncertainty (say,elbo_sd
>> 0.1). -
This field of
OUTPUT
contains the fractional overhead, defined as (total running time / total function time - 1). Typically, you would expectoverhead
to be of order 1 or smaller for normal runs of VBMC. If the overhead is much larger than 1 (e.g., 10 or more), your problem affords fast evaluations and it is possible that it would benefit from other algorithms than VBMC (see above).For VBMC test problems and examples, you will find that the reported overhead is astronomical, which is expected since for demonstration purposes we are using simple analytical functions which are extremely fast to evaluate.
-
vbmc
currently has two undocumented outputs,optimState
andstats
. The former contains a large number of fields related to the internal state of the VBMC algorithm, whereas the latter contains detailed information and statistics about each iteration. As of now, these structs are returned for development of the algorithm and debugging purposes, and no explicit support is provided. Future versions of VBMC might change the interface or the internal structure of these outputs.
-
A target function is noisy (or stochastic) if it returns noisy observations of the true underlying log-joint (unnormalized log posterior). Conversely, a non-noisy target function is deterministic. Log likelihoods can be noisy when evaluated through simulation (e.g., via Monte Carlo methods).
Starting from v1.0 (June 2020), VBMC offers support for noisy target functions as input, as explained below.
-
No. In order to perform inference with a noisy target function, you need to:
- manually set
OPTIONS.SpecifyTargetNoise
totrue
; - pass to VBMC a function
fun
that returns as second output an estimate of the standard deviation (SD) of the log-joint evaluation atX
. See below and Example 6 in the VBMC tutorial for further information.
- manually set
-
No. See above.
-
Kind of. Whatever estimation technique you use, VBMC expects the estimates of the log-likelihood (or log-joint) to be approximately unbiased and normally-distributed, with an available estimate of the standard deviation. Inverse binomial sampling (IBS) is a technique that checks all these boxes. The synthetic likelihood (SL) method could also work.
-
If you use inverse binomial sampling (IBS), the algorithm returns the variability of the estimate as second output. Just ensure that the variability is returned as standard deviation (SD) and not as the variance (depending on the implementation, you may have to change some options of
ibslike
).Otherwise, you can also estimate the SD via bootstrap or similar approaches.
-
While VBMC has been shown to handle correctly up to a fairly large amount of observation noise (SD ~ 7 points in the modal region), ideally you want the SD of the noise in the log-likelihood to be around 1, and probably not larger than ~3, at least in the area covered by a meaningful fraction of posterior probability mass.
-
No, that is a silly idea. This approach might freeze the noise, but would not remove it (see here).
-
If
OPTIONS.Display
is set toiter
, VBMC displays the traces of several quantities:- the
iteration
number; - the number of target function evaluations
f-count
; - the current estimate of the mean of the ELBO,
Mean[ELBO]
; - the uncertainty (standard deviation) on the current estimate of the ELBO,
Std[ELBO]
; - the change in variational posterior from the previous iteration, measured in terms of symmetrized Kullback-Leibler (KL) divergence between the two variational posteriors,
sKL
; - the number of components of the current variational mixture
K
; - a metric of
convergence
of the current solution (values less than 1 are necessary for convergence, but not sufficient); - additional actions (such as starting and ending of the warm-up stage).
- the
-
I noticed that the series of displayed
Mean[ELBO]
values is not monotonically increasing. Shouldn't the ELBO keep increasing during the variational optimization?Well spotted, but nothing to worry about. VBMC keeps updating its estimate of the ELBO, which means that this value may decrease across iterations, and sometimes (especially at the beginning) it will oscillate. This is part of the normal functioning of VBMC. However, if the ELBO keeps oscillating wildly even at later stages in the inference, you may be in trouble (see below).
-
So many questions. VBMC can fail in many ways, some of which blatant, some which harder to detect — similarly to other inference methods such as Markov Chain Monte Carlo. For this reason, you should always check and validate the solutions found by VBMC.
We list here the major failure modes, diagnostics, and possible solutions:
- In the simplest case, the VBMC algorithm fails to converge within the allotted budget of target function evaluations.
- This is easy to detect (look at
EXITFLAG
or at theOUTPUT
structure). There may be several distinct reasons for failure of convergence (see below).
- This is easy to detect (look at
- VBMC converges, but the variational optimization has only found a local optimum.
- You cannot detect this issue by looking at a single VBMC run. For this reason, I recommend to run several VBMC runs from different starting points (at least 3-4) and compare the solutions, for example via visual inspection of the posteriors and using the
vbmc_diagnostics
function (see also Example 4 in the VBMC tutorial).
- You cannot detect this issue by looking at a single VBMC run. For this reason, I recommend to run several VBMC runs from different starting points (at least 3-4) and compare the solutions, for example via visual inspection of the posteriors and using the
- Multiple runs of VBMC converge to pretty much the same variational solution, which fails to capture important aspects of the true posterior.
- This problem has no obvious solution, in that it is intrinsic to the fact that we are using an approximation that it may deviate from the true posterior. For example, variational posteriors, for how the variational objective is defined, tend to underestimate the true uncertainty of the posterior. You can use posterior predictive checks to gain confidence that the found solution makes sensible predictions, also in terms of calibration.
- In the simplest case, the VBMC algorithm fails to converge within the allotted budget of target function evaluations.
-
Lack of convergence is reported when VBMC reaches the maximum number of function evaluations, but the variational solution has not stabilized according to the reliability index. To understand better what's going on, and how you could solve this issue, have a look at the VBMC trace across iterations:
-
On average,
Mean[ELBO]
has been increasing whileStd[ELBO]
has been decreasing over the last iterations, with some fluctuations. This suggests that VBMC is converging, it probably just needs more iterations. Restart VBMC using the returned solution as starting point, for a better initialization of the algorithm (see Example 3 in the VBMC tutorial). -
Mean[ELBO]
andStd[ELBO]
have been fluctuating wildly across iterations. This suggests that VBMC cannot build a reasonable Gaussian process (GP) approximation of the target posterior. There could be a number of reasons for that, but in many cases, the GP approximation is failing because VBMC has accidentally sampled one or a few points with extremely low posterior probability density. Because of these points, the GP approximation ends up having a very high output scale parameter (that is, the GP models a wildly fluctuating function). Sometimes, starting with a narrower plausible range, which only includes a region of (relatively) high posterior probability mass, is enough to prevent this kind of divergence.
In either case, you can try the "Automated Retry" option built-in in VBMC. To activate it, set
OPTIONS.RetryMaxFunEvals
to some budget of function evaluations greater than 0 (a typical choice is simplyOPTIONS.RetryMaxFunEvals = OPTIONS.MaxFunEvals
). In case the first attempt did not converge, VBMC will perform a second run using the specified budget of additional function evaluations. For this second attempt, VBMC will use information gathered during the first run, and it modifies a bunch of hyperparameters of the algorithm so as to increase the chance of convergence. -
-
VBMC crashes saying that
The returned function value must be a finite real-valued scalar
. What do I do?This error means that your target function has returned
Inf
,NaN
, or a complex number. You should check your code and understand why it returned a non-finite or complex value.Inf
s andNaN
s often arise because there are outcomes in your dataset (e.g., responses in a trial) to which the tested model assigns a probability of0
(usually due to numerical truncation).log(0)
yields-Inf
, which is then propagated. In these cases, we recommend to make the model more robust by forcing all outcomes (e.g., the likelihood associated with each trial) to have a minimum non-null probability, such assqrt(eps)
or some other small value. This should not be necessary if the model already includes a non-zero lapse rate.Complex numbers usually arise because you are taking either a
log
or asqrt
of a negative number (of a quantity that should not be negative). You might be setting wrong bounds for your variables, or maybe there are indexing issues.Note that some MATLAB optimizers, such as
fmincon
, are robust toInf
s andNaN
s and just keep going, avoiding the problematic region in parameter space. For this reason, you might have been able to fit the same models before with an optimizer, without encountering errors. However, ignoring errors in the likelihood can be dangerous as it might hide deeper issues with the model implementation. -
I have been running VBMC from the same starting point, but I get different results each time. Is something wrong?
Nothing is wrong per se. VBMC uses stochastic optimization, so results will differ between different runs, even with the same initial condition
X0
. The question is whether the returnedelbo
or variational posteriorvp
vary substantially across runs from the same starting point. If so, it might be a sign that your posterior landscape is particularly difficult.If you want to have reproducible results (and this advice applies beyond VBMC), I recommend to fix MATLAB's random seed to some known quantity via
rng
(e.g., setting it to the run number).
-
The interface of VBMC is very similar to the one of BADS, and you may only need minor or even no changes to run VBMC on your problem. Note that:
- Unlike BADS, plausible bounds (
PLB
andPUB
) in VBMC need to be distinct from the hard boundsLB
andUB
(see here). - Also, hard bounds in VBMC have slightly different meaning and constraints.
- The target function
fun
of VBMC is a log posterior density — that is, you have to specify a prior; - Beware of the sign! The target function in VBMC is the (positive) log-joint, that is log-likelihood plus log-prior — this differs from BADS, which minimizes the objective (for example, maximum-likelihood estimation in BADS would take as input the negative log-likelihood);
- You can use as starting point
x0
for VBMC the result of a preliminary optimization with BADS, to ensure you are starting from a region of high posterior density; - VBMC supports inference with noisy target functions, but it has more requirements than in BADS (see this section);
- Finally, VBMC does not offer a special support for circular or periodic coordinates, such as angles — just treat them as normal bounded coordinates, setting hard bounds on the length of one period (e.g,
LB = 0
andUB = 2*pi
).
- Unlike BADS, plausible bounds (
-
This is interesting, but why is it better than just using point estimates? (e.g., maximum-likelihood or maximum-a-posteriori)
There are several reasons for which having access to the full Bayesian posterior is desirable:
- Knowing the uncertainty in the parameter estimates;
- Making overall better model predictions (that is, more accurate and more calibrated) — for example, if the posterior is asymmetric or multi-modal, it is badly characterized by a single point estimate and model predictions can be unreasonably confident (in other words, you may overfit);
- Understanding features of the model, such as trade-offs between parameters, which may not be obvious.
Also, the log model evidence (as approximated by its lower bound, the ELBO) is a principled metric for Bayesian model comparison that takes into account goodness of fit and actual model complexity. With point estimates, you can only compute simple metrics such as Akaike's Information Criterion (AIC) or the "Bayesian" Information Criterion (BIC), which know very little about the structure of your model.
-
Fair enough — just be aware that you may be missing important features of your model and data, and your estimates and predictions might be way overconfident.
If you only care about finding point estimates, you might want to have a look at our method for efficient Bayesian optimization, Bayesian Adaptive Direct Search (BADS).
-
VBMC is available in MATLAB here and in Python as the PyVBMC package. Currently, there is no plan to port the algorithm to other languages, since there are ways to run the Python version from several other languages (for example, our users have been able to run PyVBMC from Julia).