-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes and improvements to experimental Gibbs
#2231
Conversation
Pull Request Test Coverage Report for Build 9143463911Details
💛 - Coveralls |
A few things:
|
capture tail differences + remove subsampling of chains since it doesn't really matter that when we're using aggressive thinning and test statistics based on comparing order stats
As there have been a few scenarios where we've hit some interesting snags wrt. failures of tests when only looking at "simple" statistics, e.g. mean, I'm trying to use something a bit more prinicpled. Specifically, I've added tests using Anderson-Darling tests (similar to Kolmogorov-Smirnov, but integrating over the entire ECDF instead of just considering the supremum) where we have tests to make sure that the significance level is set in such a way that a) that minor (both additive and mutliplicative) perturbations to the "true" samples are caught, but also b) tests between "true" samples and the samples of interest pass. |
… torfjelde/gibbs-new-improv
Pfft I think the way of testing the marginals here is a really good way to go, and it's almost there, but I'm starting to think that maybe Anderson-Darling is just a bit too strong of a test; it puts particular emphasis on the tails of the distributions, which of course can be a bit of a problem for any kind of MCMC output. I'm thinking something like a Cramer-von Mises test would be perfect, as it doesn't inflate the differences in the tails of distribution, but is still more suitable than a Kolmogorov-Smirnov test (which only considers the maximum difference between the two distributions). Buuut no implementation of Cramer-von Mises exists in HypothesisTests.jl, so that's a bit annoying (see related issue: JuliaStats/HypothesisTests.jl#201). Don't have time to implement it myself right now (might be worth just piggy-backing off of scipy's implementation or something to give it a try), but something to keep in mind for future reference. Note that this "investigation" all started because we've had experiences in the past where just testing means and std isn't really good enough, and so we start comparing quantiles. But if we're comparing quantiles, then we might as well just do a proper ECDF hypothesis test, where we make sure the acceptance threshold is sufficient to capture certain differences. One possible (though seems lightly hacky) alternative might be to just compare the underlying test-statistics of samples from the inference method to be tested vs. test-statistics from "perturbations" of the "true" samples. E.g. you scale and shift the "true" samples, compute test-statistic, make sure that the test-statistic of samples from inference alg is better than these perturbed ones. Seems a bit hacky but also probably some way of theoretically motivating this. |
@torfjelde, to clarify, the new Gibbs sampler is now working with an |
Will get to this later today 👍 |
Turing.jl/src/experimental/gibbs.jl Line 311 in 3d3c944
(can't seem to review lines that are not changed, so copied permlink) maybe I'm reading this wrong, this feels like we are just throwing away varinfos[1] . Should there be an extra line that merge varinfos[1] into vi_base and then replace it with vi_base ?
If this needs change, then also Turing.jl/src/experimental/gibbs.jl Line 357 in 3d3c944
|
So I did indeed consider dispatching on the adtype, but it's not quite enough for what we want to achieve here. Yes, it works with But I think the reasoning is fair, though it would be nice if, say, the |
Very nice catch! From a first glance, it indeed looks like we're missing a |
@torfjelde, please address @sunxd3’s comment. Then, let’s merge this PR as-is since it contains bug fixes. The remaining issues can be addressed via separate PRs. |
Done 👍 The bug was just in the initial step, so didn't really make much of a difference, but added the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2231 +/- ##
==========================================
+ Coverage 83.09% 85.82% +2.73%
==========================================
Files 24 24
Lines 1591 1623 +32
==========================================
+ Hits 1322 1393 +71
+ Misses 269 230 -39 ☔ View full report in Codecov by Sentry. |
CI is failliing because of |
@torfjelde to clarify, do we still need tpapp/LogDensityProblemsAD.jl#33 after 7c4368e? |
Pull Request Test Coverage Report for Build 9946003565Details
💛 - Coveralls |
Chiming in before @torfjelde's reply. My judgement here is that we don't need the API (i.e., Turing.jl/src/mcmc/abstractmcmc.jl Line 51 in d40d82b
That being said, being able to dispatch on particular Although for personal taste, I don't think we need to get ahead of ourselves right now. |
We no longer need it, no. But it would make things cleaner As @sunxd3 said, we've basically just implemented that thing ourselves here |
This PR looks ready to merge |
@torfjelde @yebai are we ready to release? |
Yeah this should be good to go now:) |
Great I'll look into incorporating the slice samplers using the new interface! |
Turns out that there were some subtle bugs present in the impl of
Turing.Experimental.Gibbs
which is likely part of the reason why we were seeing some strange results here and there.This PR does the following:
condition
instead ofgibbs_condition
.initial_params
.Remaining TODOs:
externalsampler
(part of the reason why I waited with this PR was it needed some functionality from Fixes to AD backend usage inexternalsampler
#2223 )Fix #2230 fix #2234