Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and improvements to experimental Gibbs #2231

Merged
merged 40 commits into from
Jul 16, 2024
Merged

Conversation

torfjelde
Copy link
Member

@torfjelde torfjelde commented May 18, 2024

Turns out that there were some subtle bugs present in the impl of Turing.Experimental.Gibbs which is likely part of the reason why we were seeing some strange results here and there.

This PR does the following:

  • Fixes these issues with, i.e. we're no longer accidentally hitting condition instead of gibbs_condition.
  • Properly handles initial_params.
  • More rigorous correctness testing of inference results.

Remaining TODOs:

Fix #2230 fix #2234

@coveralls
Copy link

Pull Request Test Coverage Report for Build 9143463911

Details

  • 0 of 30 (0.0%) changed or added relevant lines in 3 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage remained the same at 0.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/mcmc/Inference.jl 0 1 0.0%
src/mcmc/abstractmcmc.jl 0 13 0.0%
src/experimental/gibbs.jl 0 16 0.0%
Files with Coverage Reduction New Missed Lines %
src/experimental/gibbs.jl 1 0.0%
src/mcmc/abstractmcmc.jl 2 0.0%
Totals Coverage Status
Change from base Build 9141965383: 0.0%
Covered Lines: 0
Relevant Lines: 1549

💛 - Coveralls

@torfjelde
Copy link
Member Author

A few things:

  1. Sampling from gdemo using CSMC in the gibbs sampler is very slow. Removing those tests (requires a large number of iterations to something reasonable).
  2. We're hitting OOM issues when using the SMC samplers. Seems somewhat crazy that this would happen.

torfjelde added 3 commits May 20, 2024 11:33
capture tail differences + remove subsampling of chains since it
doesn't really matter that when we're using aggressive thinning and
test statistics based on comparing order stats
@torfjelde
Copy link
Member Author

As there have been a few scenarios where we've hit some interesting snags wrt. failures of tests when only looking at "simple" statistics, e.g. mean, I'm trying to use something a bit more prinicpled. Specifically, I've added tests using Anderson-Darling tests (similar to Kolmogorov-Smirnov, but integrating over the entire ECDF instead of just considering the supremum) where we have tests to make sure that the significance level is set in such a way that a) that minor (both additive and mutliplicative) perturbations to the "true" samples are caught, but also b) tests between "true" samples and the samples of interest pass.

@torfjelde
Copy link
Member Author

Pfft I think the way of testing the marginals here is a really good way to go, and it's almost there, but I'm starting to think that maybe Anderson-Darling is just a bit too strong of a test; it puts particular emphasis on the tails of the distributions, which of course can be a bit of a problem for any kind of MCMC output.

I'm thinking something like a Cramer-von Mises test would be perfect, as it doesn't inflate the differences in the tails of distribution, but is still more suitable than a Kolmogorov-Smirnov test (which only considers the maximum difference between the two distributions). Buuut no implementation of Cramer-von Mises exists in HypothesisTests.jl, so that's a bit annoying (see related issue: JuliaStats/HypothesisTests.jl#201).

Don't have time to implement it myself right now (might be worth just piggy-backing off of scipy's implementation or something to give it a try), but something to keep in mind for future reference.

Note that this "investigation" all started because we've had experiences in the past where just testing means and std isn't really good enough, and so we start comparing quantiles. But if we're comparing quantiles, then we might as well just do a proper ECDF hypothesis test, where we make sure the acceptance threshold is sufficient to capture certain differences.

One possible (though seems lightly hacky) alternative might be to just compare the underlying test-statistics of samples from the inference method to be tested vs. test-statistics from "perturbations" of the "true" samples. E.g. you scale and shift the "true" samples, compute test-statistic, make sure that the test-statistic of samples from inference alg is better than these perturbed ones. Seems a bit hacky but also probably some way of theoretically motivating this.

@yebai
Copy link
Member

yebai commented Jun 12, 2024

Make it work for externalsampler

@torfjelde, to clarify, the new Gibbs sampler is now working with an external sampler, right? If so, let us resolve the merge conflict and get this PR merged.

@torfjelde
Copy link
Member Author

Will get to this later today 👍

@sunxd3
Copy link
Member

sunxd3 commented Jul 9, 2024

varinfos_new = DynamicPPL.setindex!!(varinfos, vi_base, 1)

(can't seem to review lines that are not changed, so copied permlink)
maybe I'm reading this wrong, this feels like we are just throwing away varinfos[1]. Should there be an extra line that merge varinfos[1] into vi_base and then replace it with vi_base?

If this needs change, then also

varinfos_new = DynamicPPL.setindex!!(

@torfjelde
Copy link
Member Author

A issue is that the type of f::LogDensityProblemsAD.ADGradientWrapper will be thrown away, and a new wrapper will be created according to sampler's AD config. As far as I can tell, this is what we are doing right now anyway, but may worth some consideration.

So I did indeed consider dispatching on the adtype, but it's not quite enough for what we want to achieve here. Yes, it works with ExternalSampler, but we wanted to use this as a playground to enable Gibbs even outside of Turing.jl, in which case you're no longer working with an ExternalSampler.

But I think the reasoning is fair, though it would be nice if, say, the adtype used in a gradient wrapper would also be stored with the gradient wrapper itself or something, so you could retrive this from the gradient wrapper itself 😕

@torfjelde
Copy link
Member Author

hould there be an extra line that merge varinfos[1] into vi_base and then replace it with vi_base?

Very nice catch! From a first glance, it indeed looks like we're missing a merge 👍

@yebai
Copy link
Member

yebai commented Jul 10, 2024

@torfjelde, please address @sunxd3’s comment. Then, let’s merge this PR as-is since it contains bug fixes. The remaining issues can be addressed via separate PRs.

@torfjelde
Copy link
Member Author

Done 👍

The bug was just in the initial step, so didn't really make much of a difference, but added the merge now.

Copy link

codecov bot commented Jul 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.82%. Comparing base (142dab3) to head (d40d82b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2231      +/-   ##
==========================================
+ Coverage   83.09%   85.82%   +2.73%     
==========================================
  Files          24       24              
  Lines        1591     1623      +32     
==========================================
+ Hits         1322     1393      +71     
+ Misses        269      230      -39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@torfjelde
Copy link
Member Author

CI is failliing because of check_model issues on Julia 1.7 that somehow weren't caught before. Have opened PR to address this in DPPL.

@yebai
Copy link
Member

yebai commented Jul 15, 2024

@torfjelde to clarify, do we still need tpapp/LogDensityProblemsAD.jl#33 after 7c4368e?

@coveralls
Copy link

coveralls commented Jul 15, 2024

Pull Request Test Coverage Report for Build 9946003565

Details

  • 0 of 42 (0.0%) changed or added relevant lines in 3 files are covered.
  • 1259 unchanged lines in 21 files lost coverage.
  • Overall coverage increased (+2.7%) to 86.041%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/mcmc/Inference.jl 0 1 0.0%
src/experimental/gibbs.jl 0 17 0.0%
src/mcmc/abstractmcmc.jl 0 24 0.0%
Files with Coverage Reduction New Missed Lines %
src/variational/VariationalInference.jl 4 0.0%
src/Turing.jl 9 0.0%
src/mcmc/gibbs_conditional.jl 12 0.0%
src/mcmc/is.jl 16 0.0%
src/stdlib/RandomMeasures.jl 22 0.0%
src/essential/container.jl 27 0.0%
ext/TuringDynamicHMCExt.jl 29 0.0%
src/mcmc/abstractmcmc.jl 34 0.0%
src/mcmc/emcee.jl 47 0.0%
ext/TuringOptimExt.jl 50 0.0%
Totals Coverage Status
Change from base Build 9872415851: 2.7%
Covered Lines: 1393
Relevant Lines: 1619

💛 - Coveralls

@sunxd3
Copy link
Member

sunxd3 commented Jul 16, 2024

Chiming in before @torfjelde's reply.

My judgement here is that we don't need the API (i.e., replace_l), as all it really does is

return LogDensityProblemsAD.ADgradient(adtype, setmodel(parent(f), model))

That being said, being able to dispatch on particular AdGradientWrapper subtype is better, because there might be different keyword arguments, e.g. https://github.com/tpapp/LogDensityProblemsAD.jl/blob/449e5661bc2667f7bef061e148a6ea5526cbb427/ext/LogDensityProblemsADForwardDiffExt.jl#L98-L101 to ADgradient we might want to be able to control.

Although for personal taste, I don't think we need to get ahead of ourselves right now.

@torfjelde
Copy link
Member Author

torfjelde commented Jul 16, 2024

@torfjelde to clarify, do we still need tpapp/LogDensityProblemsAD.jl#33 after 7c4368e?

We no longer need it, no. But it would make things cleaner

As @sunxd3 said, we've basically just implemented that thing ourselves here

@sunxd3
Copy link
Member

sunxd3 commented Jul 16, 2024

This PR looks ready to merge

@yebai yebai merged commit 29a1342 into master Jul 16, 2024
60 checks passed
@yebai yebai deleted the torfjelde/gibbs-new-improv branch July 16, 2024 08:52
@sunxd3
Copy link
Member

sunxd3 commented Jul 16, 2024

@torfjelde @yebai are we ready to release?

@torfjelde
Copy link
Member Author

Yeah this should be good to go now:)

@Red-Portal
Copy link
Member

Great I'll look into incorporating the slice samplers using the new interface!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undeterministic test failure Experimental Gibbs seems to have higher variance than "stable" Gibbs (maybe)
6 participants