Skip to content
Luigi Acerbi edited this page May 3, 2023 · 93 revisions

VBMC: Frequently Asked Questions

This FAQ is curated by Luigi Acerbi, and in constant expansion. This document was originally started for the MATLAB version of VBMC, but most of the answers also apply to the Python version (PyVBMC).

For a tutorial with detailed examples, see:

If you have questions not covered here, please feel free to ask in the lab Discussions forum.

Table of contents

General

  • Which kind of problems is VBMC suited for?

    I would recommend VBMC for problems in which:

    • you can calculate an estimate of the target (log) likelihood function, which can be deterministic or noisy (see below);
    • the target likelihood function is at least moderately expensive to compute (at least ~0.1 s or more per function evaluation);
    • the gradient may be unavailable;
    • the number of input parameters is up to about D = 10 (maybe up to 20, but not more);
    • the target posterior is continuous and reasonably smooth, such that it can be approximated by a Gaussian process (GP) with a smooth kernel. This is generally difficult to know a priori, but you can look for telltale signs that the VBMC approximation is failing.

    If your likelihood function is fully analytical — or, more generally, fast to evaluate —, VBMC is most likely not the best tool for your problem (see below).

  • What do I do if VBMC is not suited for my problem?

    If the likelihood function is smooth and analytical (and fast to compute), you should consider estimating the posterior via Markov Chain Monte Carlo, for example using probabilistic programming languages such as Stan or PyMC3.

    If the likelihood function is computationally expensive and non-smooth (or it produces a pathological posterior), then, well, tough luck. In this case, you may see if revising your model, for example by changing the parameterization, produces a posterior landscape which is more compatible with the approximations used by VBMC.

    Alternatively, you may give up on full (approximate) Bayesian inference, and resort instead to maximum-likelihood (or maximum-a-posteriori) estimation, using an efficient optimization algorithm such as Bayesian Adaptive Direct Search (BADS).

Installation

  • Where can I download VBMC?

    Here.

  • Which MATLAB toolboxes or external packages does VBMC require?

    At the moment, VBMC requires MATLAB's Optimization Toolbox™ and possibly the Statistics and Machine Learning Toolbox™. If I receive enough requests, I might remove dependency from these toolboxes in future versions of VBMC.

  • Which version of MATLAB do I need?

    VBMC has been tested successfully on MATLAB versions starting from R2014a onwards. It might be able to run on earlier versions (in which case, please do let me know).

  • I am having trouble installing VBMC. Can you help?

    Sure. The VBMC installation should be pretty straightforward, so write me in detail which problem you are having.

Input arguments (target function: fun)

  • What is the target function?

    The target function in VBMC is the (unnormalized) log posterior or log joint probability density, usually specified via a function handle fun, the first input argument to vbmc.

    For a typical model-fitting problem, fun is a function that computes the (unnormalized) log posterior of an input parameter vector x, for a given dataset and model. The log posterior is computed as log likelihood plus log prior.

    If you are unsure, see the following questions for more information about the likelihood function and the prior.

  • I do not have a likelihood function, but another loss function. Can I still use VBMC?

    For the results of VBMC to make sense, you really need a likelihood (and a prior; see below).

    However, sometimes you might already almost have a likelihood, and you just do not know about it. For example, suppose you are fitting a model via least squares, such that your target is a quadratic loss function defined as:

    loss = @(data_x,data_y,x) sum((f(data_x,x) - data_y).^2);

    where f returns an array of model predictions for model parameters x and data data_x, and data_y is an array of observed responses. You can transform the above equation in a log likelihood as:

    function ll = log_likelihood(x,data_x,data_y) 
      sigma2 = exp(2*x(end));      % Extract sigma^2 from log sigma parameterization
      N = size(data_y,1);          % Number of trials (rows of y)
      x_orig = x(1:end-1);         % Original parameter vector (without sigma)
      ll = -0.5*sum((f(data_x,x_orig) - data_y).^2)/sigma2 - 0.5*N*log(2*pi*sigma2);
    end

    where we have added log(sigma) as an additional model parameter at the end of the parameter vector x_orig (the logarithmic representation is convenient to ensure that sigma is positive). sigma is the standard deviation of the observed values data_y with respect to the model predictions f(data_x,x) (that is, the standard deviation of the residuals). The last line defines the log likelihood of a series of observations corrupted by Gaussian noise. Note that maximizing log_likelihood with respect to x is equivalent to minimizing the quadratic loss defined above.

  • Wait, do I need a prior?

    Yes, you do. For the marginal likelihood (model evidence) to make sense as a model comparison metric, you should use a proper prior over the parameters (that is, a prior that integrates to 1). Horrible things happen if you compute the marginal likelihood with an improper or unnormalized prior. Since VBMC computes the ELBO, a lower bound to the (log) marginal likelihood, you need a prior.

    Also, a prior will come in handy to set plausible bounds for the parameters (see below).

    A uniform bounded prior over parameters is okay (although not ideal), as long as you correctly normalize it.

  • How do I specify the prior to VBMC?

    At the moment, VBMC does not have explicit knowledge of the prior. You pass to VBMC the target function fun, which is a black-box handle to the log joint distribution, that is log likelihood plus log prior.

    For this reason, VBMC does not automatically normalize the prior — that is up to you. For example, for a uniform prior you would have:

    fun = @(x) log_likelihood(x) - sum(log(UB - LB));

    where log_likelihood is the handle to the log likelihood function, and the second term represents the log prior (uniform over the range).

  • My target function requires additional data/inputs. How do I pass them to VBMC?

    Suppose that your function takes two inputs, fun(x,data).

    The first solution consists of defining a new function handle

    funwdata = @(x) fun(x,data);

    where data has been defined before in the code. Now you can pass to VBMC funwdata, which takes a single input. Note that a copy of data is embedded in funwdata. If you subsequently make any change to the original data, you will need to redefine the function handle if you want the changes to reflect in the inference.

    Alternatively, VBMC can take additional inputs after OPTIONS (the seventh argument). Any additional input is passed to the target function. In this case, you would write

    vbmc(fun,X0,LB,UB,PLB,PUB,OPTIONS,data) 
  • Does VBMC support inference with noisy (that is, stochastic) target functions?

    Yes, as of v1.0 (June 2020). See this section for more information.

Input arguments (domain: x0,LB,UB,PLB,PUB)

  • How do I choose the starting point X0?

    First of all, keep in mind that you should run VBMC from different starting points, as a sanity check for your solutions.

    A good starting point belongs to regions of high posterior probability density (even better, to typical regions). Generally, a point in the vicinity of the mode of the posterior would represent a reasonable starting point (this is not necessarily true in high dimension, but VBMC only operates in relatively low dimension). You can find such points by running a few iterations of an optimizer first.

    Note that a starting point cannot be on the bounds LB or UB. You can avoid this issue by constraining the preliminary optimization to be inside the plausible box, that is within PLB and PUB (see also below).

    For example, you can use Bayesian Adaptive Direct Search (BADS) to perform the preliminary optimization inside the plausible box, as follows:

    % Pick starting point via a preliminary optimization with BADS
    nvars = numel(PLB);
    X0 = PLB + rand(1,nvars) .* (PUB - PLB);
    opt_options = bads('defaults'); 
    opt_options.MaxFunEvals = 50*D;  % Quick optimization
    X0 = bads(@(x) -fun(x),X0,PLB,PUB,[],[],[],opt_options);
  • How do I choose LB and UB?

    LB and UB are the hard bounds of the parameter space of the model of interest. These are strict inequalities, in that for any coordinate dimension i, LB(i) < X(i) < UB(i).

    You can set the hard bounds to the support of the posterior probability density (that is, the region with nonzero probability density). For each each coordinate dimension, the choice is between:

    • unconstrained coordinates, specified with LB(i) = -Inf and UB(i) = Inf; or
    • bounded coordinates, in which both LB(i) and UB(i) are finite (with LB(i) < UB(i)).
  • Can I have parameters bounded only on one side?

    Half-bounded parameters are coordinate dimensions for which one bound is finite and the other is infinite. At the moment, VBMC does not support half-bounded parameters.

    If you need to set a half-bound constraint, fix the 'unbounded' half to some large but meaningful value (e.g., the 0.01th or 99.9th percentile of the prior, for, respectively, an 'unbounded' lower/upper bound). Do not set the unbounded half to some impossibly large value, with respect to the scale of your problem, otherwise VBMC is likely to fail.

  • Can I set LB = UB for some variable to fix it to a given value?

    No — at the moment, LB and UB need to be distinct. This feature might be included in future versions of VBMC.

  • How do I choose PLB and PUB?

    PLB and PUB are the plausible (or reasonable) bounds, which denote a region of high posterior probability mass. Plausible bounds are used by VBMC to draw training points to construct the starting approximation of the posterior, and to initialize various hyperparameters of the algorithm, such as typical scales for the parameters.

    The plausible bounds need to be finite, and distinct from the hard bounds LB and UB.

    In the absence of other information, I would recommend to set the plausible range as the ~68.26% percentile interval of the prior (that is, set PLB to the 15.87% percentile and PUB to the 84.13% percentile).

    For example, for a Gaussian prior with a given mean mu and standard deviation SD, this would amount to

    PLB = norminv(0.1587,mu,SD);
    PUB = norminv(0.8413,mu,SD);

    which is about equivalent to

    PLB = mu - SD;
    PUB = mu + SD;

    Somewhat counter-intuitively, if in doubt about the location of a high posterior probability region, it is better to choose narrower plausible bounds than wide ones. The rationale is that the Gaussian process approximation built by VBMC might fail if the algorithm ends up sampling from a region with extremely low posterior probability, and this is more likely to happen if the plausible bounds are arbitrarily large.

  • How do I prevent VBMC from evaluating certain inputs or regions of input space?

    If these regions can be identified by coordinate-wise ranges, use LB and UB. Otherwise, use the prior over parameters to gradually and continuously reduce the probability for those regions (but never to zero).

    Do not have FUN(X) return Inf, NaN or 0 for invalid inputs. VBMC would simply crash (see this question).

    Absolutely do NOT have FUN(X) return an arbitrarily small number (such as realmin) to enforce VBMC to avoid certain regions. This strategy may seem innocent enough, but in fact it may cripple VBMC by making its model of the target function nonsensical.

  • Does VBMC support inference with integer parameters?

    No, VBMC does not support integer parameters (that is, variables forced to be integers — or, more in general, constrained to discrete values). Be aware that simple workarounds might break down the VBMC approximation, in particular the assumption that the underlying target function (the log posterior) is continuous and reasonably smooth.

Output arguments

  • What is vp and what do I do with it?

    vp is a MATLAB structure that represents the variational posterior — an approximation of the true posterior — computed by the VBMC inference algorithm.

    You can query and manipulate the variational posterior with the following functions:

    • vbmc_rnd: generate random samples from the variational posterior;
    • vbmc_pdf: evaluate the probability density of the variational posterior at a given point;
    • vbmc_moments: compute mean and covariance matrix of the variational posterior;
    • vbmc_mode: compute the mode of the variational posterior (that is, the maximum-a-posteriori or MAP estimate);
    • vbmc_kldiv: compute Kullback-Leibler divergence between two variational posteriors;
    • vbmc_mtv: compute marginal total variation distance between two variational posteriors;
    • vbmc_isavp: check whether the input is a variational posterior struct from VBMC.

    Also, if you have a struct or cell array which stores multiple variational posteriors (e.g., vp{1},vp{2},...), you can call vbmc_diagnostics to run some convergence tests.

    You can see several usage examples in the VBMC tutorial here.

  • What are elbo and elbo_sd?

    elbo is the variational Evidence Lower BOund estimated by VBMC, that is an approximation of a lower bound on the log marginal likelihood (or log model evidence). Note that there are two levels of approximation:

    • variational inference computes the ELBO, which is a lower bound on the log marginal likelihood;
    • VBMC estimates the ELBO using a GP surrogate of the true posterior and Bayesian quadrature.

    elbo_sd is the uncertainty (standard deviation) on the second part of the above approximation. In variational inference, there is no direct way to know how far the ELBO is from the true log marginal likelihood (which would be the error of the first part).

    You can use elbo as a metric for model selection, with elbo_sd providing one of several diagnostics of convergence. You should not trust solutions with a fairly large uncertainty (say, elbo_sd >> 0.1).

  • How is overhead in OUTPUT defined?

    This field of OUTPUT contains the fractional overhead, defined as (total running time / total function time - 1). Typically, you would expect overhead to be of order 1 or smaller for normal runs of VBMC. If the overhead is much larger than 1 (e.g., 10 or more), your problem affords fast evaluations and it is possible that it would benefit from other algorithms than VBMC (see above).

    For VBMC test problems and examples, you will find that the reported overhead is astronomical, which is expected since for demonstration purposes we are using simple analytical functions which are extremely fast to evaluate.

  • What's in the undocumented outputs of vbmc?

    vbmc currently has two undocumented outputs, optimState and stats. The former contains a large number of fields related to the internal state of the VBMC algorithm, whereas the latter contains detailed information and statistics about each iteration. As of now, these structs are returned for development of the algorithm and debugging purposes, and no explicit support is provided. Future versions of VBMC might change the interface or the internal structure of these outputs.

Noisy target function

  • What's a noisy target function?

    A target function is noisy (or stochastic) if it returns noisy observations of the true underlying log-joint (unnormalized log posterior). Conversely, a non-noisy target function is deterministic. Log likelihoods can be noisy when evaluated through simulation (e.g., via Monte Carlo methods).

    Starting from v1.0 (June 2020), VBMC offers support for noisy target functions as input, as explained below.

  • Does VBMC automatically detect that the target function is noisy?

    No. In order to perform inference with a noisy target function, you need to:

    • manually set OPTIONS.SpecifyTargetNoise to true;
    • pass to VBMC a function fun that returns as second output an estimate of the standard deviation (SD) of the log-joint evaluation at X. See below and Example 6 in the VBMC tutorial for further information.
  • Does VBMC automatically infer the amount of noise in the target function?

    No. See above.

  • Can I use any technique to estimate a noisy log-likelihood?

    Kind of. Whatever estimation technique you use, VBMC expects the estimates of the log-likelihood (or log-joint) to be approximately unbiased and normally-distributed, with an available estimate of the standard deviation. Inverse binomial sampling (IBS) is a technique that checks all these boxes. The synthetic likelihood (SL) method could also work.

  • How do I estimate the standard deviation of the noisy log-likelihood?

    If you use inverse binomial sampling (IBS), the algorithm returns the variability of the estimate as second output. Just ensure that the variability is returned as standard deviation (SD) and not as the variance (depending on the implementation, you may have to change some options of ibslike).

    Otherwise, you can also estimate the SD via bootstrap or similar approaches.

  • How large can the noise be to perform successful inferences with VBMC?

    While VBMC has been shown to handle correctly up to a fairly large amount of observation noise (SD ~ 7 points in the modal region), ideally you want the SD of the noise in the log-likelihood to be around 1, and probably not larger than ~3, at least in the area covered by a meaningful fraction of posterior probability mass.

  • Wait, can I just remove stochasticity by fixing the random seed at each function call?

    No, that is a silly idea. This approach might freeze the noise, but would not remove it (see here).

Display

  • What are the quantities displayed by VBMC during the variational optimization?

    If OPTIONS.Display is set to iter, VBMC displays the traces of several quantities:

    • the iteration number;
    • the number of target function evaluations f-count;
    • the current estimate of the mean of the ELBO, Mean[ELBO];
    • the uncertainty (standard deviation) on the current estimate of the ELBO, Std[ELBO];
    • the change in variational posterior from the previous iteration, measured in terms of symmetrized Kullback-Leibler (KL) divergence between the two variational posteriors, sKL;
    • the number of components of the current variational mixture K;
    • a metric of convergence of the current solution (values less than 1 are necessary for convergence, but not sufficient);
    • additional actions (such as starting and ending of the warm-up stage).
  • I noticed that the series of displayed Mean[ELBO] values is not monotonically increasing. Shouldn't the ELBO keep increasing during the variational optimization?

    Well spotted, but nothing to worry about. VBMC keeps updating its estimate of the ELBO, which means that this value may decrease across iterations, and sometimes (especially at the beginning) it will oscillate. This is part of the normal functioning of VBMC. However, if the ELBO keeps oscillating wildly even at later stages in the inference, you may be in trouble (see below).

Troubleshooting

  • How can VBMC fail, how do I find out, and how do I fix it?

    So many questions. VBMC can fail in many ways, some of which blatant, some which harder to detect — similarly to other inference methods such as Markov Chain Monte Carlo. For this reason, you should always check and validate the solutions found by VBMC.

    We list here the major failure modes, diagnostics, and possible solutions:

    • In the simplest case, the VBMC algorithm fails to converge within the allotted budget of target function evaluations.
      • This is easy to detect (look at EXITFLAG or at the OUTPUT structure). There may be several distinct reasons for failure of convergence (see below).
    • VBMC converges, but the variational optimization has only found a local optimum.
      • You cannot detect this issue by looking at a single VBMC run. For this reason, I recommend to run several VBMC runs from different starting points (at least 3-4) and compare the solutions, for example via visual inspection of the posteriors and using the vbmc_diagnostics function (see also Example 4 in the VBMC tutorial).
    • Multiple runs of VBMC converge to pretty much the same variational solution, which fails to capture important aspects of the true posterior.
      • This problem has no obvious solution, in that it is intrinsic to the fact that we are using an approximation that it may deviate from the true posterior. For example, variational posteriors, for how the variational objective is defined, tend to underestimate the true uncertainty of the posterior. You can use posterior predictive checks to gain confidence that the found solution makes sensible predictions, also in terms of calibration.
  • VBMC warned me that the Returned variational solution may have not converged. What should I do?

    Lack of convergence is reported when VBMC reaches the maximum number of function evaluations, but the variational solution has not stabilized according to the reliability index. To understand better what's going on, and how you could solve this issue, have a look at the VBMC trace across iterations:

    • On average, Mean[ELBO] has been increasing while Std[ELBO] has been decreasing over the last iterations, with some fluctuations. This suggests that VBMC is converging, it probably just needs more iterations. Restart VBMC using the returned solution as starting point, for a better initialization of the algorithm (see Example 3 in the VBMC tutorial).

    • Mean[ELBO] and Std[ELBO] have been fluctuating wildly across iterations. This suggests that VBMC cannot build a reasonable Gaussian process (GP) approximation of the target posterior. There could be a number of reasons for that, but in many cases, the GP approximation is failing because VBMC has accidentally sampled one or a few points with extremely low posterior probability density. Because of these points, the GP approximation ends up having a very high output scale parameter (that is, the GP models a wildly fluctuating function). Sometimes, starting with a narrower plausible range, which only includes a region of (relatively) high posterior probability mass, is enough to prevent this kind of divergence.

    In either case, you can try the "Automated Retry" option built-in in VBMC. To activate it, set OPTIONS.RetryMaxFunEvals to some budget of function evaluations greater than 0 (a typical choice is simply OPTIONS.RetryMaxFunEvals = OPTIONS.MaxFunEvals). In case the first attempt did not converge, VBMC will perform a second run using the specified budget of additional function evaluations. For this second attempt, VBMC will use information gathered during the first run, and it modifies a bunch of hyperparameters of the algorithm so as to increase the chance of convergence.

  • VBMC crashes saying that The returned function value must be a finite real-valued scalar. What do I do?

    This error means that your target function has returned Inf, NaN, or a complex number. You should check your code and understand why it returned a non-finite or complex value.

    Infs and NaNs often arise because there are outcomes in your dataset (e.g., responses in a trial) to which the tested model assigns a probability of 0 (usually due to numerical truncation). log(0) yields -Inf, which is then propagated. In these cases, we recommend to make the model more robust by forcing all outcomes (e.g., the likelihood associated with each trial) to have a minimum non-null probability, such as sqrt(eps) or some other small value. This should not be necessary if the model already includes a non-zero lapse rate.

    Complex numbers usually arise because you are taking either a log or a sqrt of a negative number (of a quantity that should not be negative). You might be setting wrong bounds for your variables, or maybe there are indexing issues.

    Note that some MATLAB optimizers, such as fmincon, are robust to Infs and NaNs and just keep going, avoiding the problematic region in parameter space. For this reason, you might have been able to fit the same models before with an optimizer, without encountering errors. However, ignoring errors in the likelihood can be dangerous as it might hide deeper issues with the model implementation.

  • I have been running VBMC from the same starting point, but I get different results each time. Is something wrong?

    Nothing is wrong per se. VBMC uses stochastic optimization, so results will differ between different runs, even with the same initial condition X0. The question is whether the returned elbo or variational posterior vp vary substantially across runs from the same starting point. If so, it might be a sign that your posterior landscape is particularly difficult.

    If you want to have reproducible results (and this advice applies beyond VBMC), I recommend to fix MATLAB's random seed to some known quantity via rng (e.g., setting it to the run number).

Miscellanea

  • I already run BADS on my problem. How do I run VBMC?

    The interface of VBMC is very similar to the one of BADS, and you may only need minor or even no changes to run VBMC on your problem. Note that:

    • Unlike BADS, plausible bounds (PLB and PUB) in VBMC need to be distinct from the hard bounds LB and UB (see here).
    • Also, hard bounds in VBMC have slightly different meaning and constraints.
    • The target function fun of VBMC is a log posterior density — that is, you have to specify a prior;
    • Beware of the sign! The target function in VBMC is the (positive) log-joint, that is log-likelihood plus log-prior — this differs from BADS, which minimizes the objective (for example, maximum-likelihood estimation in BADS would take as input the negative log-likelihood);
    • You can use as starting point x0 for VBMC the result of a preliminary optimization with BADS, to ensure you are starting from a region of high posterior density;
    • VBMC supports inference with noisy target functions, but it has more requirements than in BADS (see this section);
    • Finally, VBMC does not offer a special support for circular or periodic coordinates, such as angles — just treat them as normal bounded coordinates, setting hard bounds on the length of one period (e.g, LB = 0 and UB = 2*pi).
  • This is interesting, but why is it better than just using point estimates? (e.g., maximum-likelihood or maximum-a-posteriori)

    There are several reasons for which having access to the full Bayesian posterior is desirable:

    • Knowing the uncertainty in the parameter estimates;
    • Making overall better model predictions (that is, more accurate and more calibrated) — for example, if the posterior is asymmetric or multi-modal, it is badly characterized by a single point estimate and model predictions can be unreasonably confident (in other words, you may overfit);
    • Understanding features of the model, such as trade-offs between parameters, which may not be obvious.

    Also, the log model evidence (as approximated by its lower bound, the ELBO) is a principled metric for Bayesian model comparison that takes into account goodness of fit and actual model complexity. With point estimates, you can only compute simple metrics such as Akaike's Information Criterion (AIC) or the "Bayesian" Information Criterion (BIC), which know very little about the structure of your model.

  • Thanks, but I think I will stick to point estimates for now.

    Fair enough — just be aware that you may be missing important features of your model and data, and your estimates and predictions might be way overconfident.

    If you only care about finding point estimates, you might want to have a look at our method for efficient Bayesian optimization, Bayesian Adaptive Direct Search (BADS).

  • Are you planning to port VBMC to other languages?

    VBMC is available in MATLAB here and in Python as the PyVBMC package. Currently, there is no plan to port the algorithm to other languages, since there are ways to run the Python version from several other languages (for example, our users have been able to run PyVBMC from Julia).

Clone this wiki locally