-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] BBB vs BBB w/ Local Reparameterization #14
Comments
Furthermore, your implementation of the closed form Bayesian-Neural-Networks/src/Bayes_By_Backprop_Local_Reparametrization/model.py Lines 25 to 28 in 022b9ce
I was wondering if you could provide any detail on how you arrived at the equation that you implemented in the code? Thanks again! |
Hi @danielkelshaw, Thanks for your question. Similarly to the regular reparametrisation trick, the local reparametrisation trick is used to estimate gradients with respect to parameters of a distribution. However, the local reparametrisation trick takes advantage of the fact that, for a fixed input and Gaussian distributions over the weights, the resulting distribution over activations is also Gaussian. Instead of sampling all the weights individually and then combining them with the inputs to compute a sample from the activations, we can directly sample from the distribution over activations. This results in a lower variance gradient estimator which in turn makes training faster and more stable. Using the local reparametrisation trick is always recommended if possible. The code for both gradient estimators is similar but not quite the same. In the code you referenced, if you look closely, you can see that we first sample the Gaussian weights:
And then pass the input through a linear layer with parameters that we just sampled:
On the other hand, for the local reparametrisation trick, we compute the parameters of the Gaussian over activations directly:
And then sample from the distribution over activations.
With regard to the KL divergence, the form used in regular BayesByBackprop is more general but requires using MC sampling to estimate it. It has the benefit of allowing for non-Gaussian priors and non-Gaussian approximate posteriors (note that our code implements the former but not the latter). We use the same weight samples to compute the model predictions and KL divergence here, saving compute and reducing variance due to the law of common random numbers. When running the local reparametrization trick, we sample activations instead of weights. Thus, we don't have access to weight samples needed to estimate the KL divergence. Because of this, we opted for the closed-form implementation. It restricts us to the Gaussian prior but has lower variance and results in faster convergence. With regard to your second question: the KL divergence between 2 Gaussians can be obtained in closed form by solving a Gaussian integral. See: https://stats.stackexchange.com/questions/60680/kl-divergence-between-two-multivariate-gaussians |
@JavierAntoran - thank you for taking the time to help explain this, I really appreciate it! I found your explanation of the local reparameterisation trick very intuitive and feel like I've got a much better grasp of that now. I'm very interested in learning more about Bayesian Neural Networks, I was wondering if you had any recommended reading that would help get me up to speed with some more of the theory? |
For general ideas about re-casting learning as inference, I would check out chapter 41 of David MacKay's Information Theory, Inference, and Learning Algorithms. Yarin Gal's thesis is also a good source. On the more practical side, the tutorial made by the guys at Papercup is quite nice. Other than that, read the papers implemented in this repo and try to understand both the algorithm and implementation. |
Hi @JavierAntoran @stratisMarkou,
First of all, thanks for making all of this code available - it's been great to look through!
Im currently spending some time trying to work through the Weight Uncertainty in Neural Networks in order to implement Bayes-by-Backprop. I was struggling to understand the difference between your implementation of
Bayes-by-Backprop
andBayes-by-Backprop with Local Reparameterization
.I was under the impression that the local reparameterization was the following:
Bayesian-Neural-Networks/src/Bayes_By_Backprop/model.py
Lines 58 to 66 in 022b9ce
However this same approach is used in both methods.
The main difference I see in the code you've implemented is the calculation of the
KL Divergence
in closed form in the Local Reparameterization version of the code due to the use of a Gaussian prior / posterior distribution.I was wondering if my understanding of the local reparameterization method was wrong, or if I had simply misunderstood the code?
Any guidance would be much appreciated!
The text was updated successfully, but these errors were encountered: