-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace old Gibbs sampler with the experimental one. #2328
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2328 +/- ##
==========================================
- Coverage 86.30% 85.39% -0.92%
==========================================
Files 22 21 -1
Lines 1577 1588 +11
==========================================
- Hits 1361 1356 -5
- Misses 216 232 +16 ☔ View full report in Codecov by Sentry. |
Pull Request Test Coverage Report for Build 12400670488Details
💛 - Coveralls |
@torfjelde, if you have a moment to take a look at the one remaining test failure, would be interested in your thoughts. We are sampling for a model with two vector variables, @testset "dynamic model" begin
@model function imm(y, alpha, ::Type{M}=Vector{Float64}) where {M}
N = length(y)
rpm = DirichletProcess(alpha)
z = zeros(Int, N)
cluster_counts = zeros(Int, N)
fill!(cluster_counts, 0)
for i in 1:N
z[i] ~ ChineseRestaurantProcess(rpm, cluster_counts)
cluster_counts[z[i]] += 1
end
Kmax = findlast(!iszero, cluster_counts)
m = M(undef, Kmax)
for k in 1:Kmax
m[k] ~ Normal(1.0, 1.0)
end
end
model = imm(Random.randn(100), 1.0)
# https://github.com/TuringLang/Turing.jl/issues/1725
# sample(model, Gibbs(MH(:z), HMC(0.01, 4, :m)), 100);
sample(model, Gibbs(; z=PG(10), m=HMC(0.01, 4; adtype=adbackend)), 100)
end |
Will have a look at this in a bit @mhauru (just need to do some grocery shopping 😬 ) |
Think I found the error: if the number of Lines 57 to 65 in 6f9679a
doesn't hit the |
I'm a bit uncertain how we should best handle this @yebai @mhauru The first partially viable idea that comes to mind is to But this would not quite be equivalent to the current implementation of Another way is to explicitly add the
Thoughts? |
I lean towards the above approach and (maybe later) provide explicit APIs to inference algorithms. This will enable us to handle reversible jumps (varying model dimensions) in MCMC more flexibly. At the moment, this is only possible in particle Gibbs; if it happens in HMC/MH, inference will likely fail (silently) EDIT: we can keep |
This does however complicate the new Gibbs sampling procedure quite drastically 😕 And it makes me bring up a question I really didn't think I'd be asking: is it then actually preferable to the current I guess we should first have a go at implementing this for the new Another point to add to the conversation that @mhauru brought to my attention the other day: we also want to support stuff like
So all in all, immediate things we need to address with Gibbs:
|
I've been trying to think of a way to fix this, that would also fix the problem where different Gibbs subsamplers can't sample the same variables (e.g. you can't first sample
Point 3. is maybe undesirable, but I think it’s minor compared to all the The only problem I see with this is combining the local state from the previous iteration of the current subsampler with the global The great benefit of sticking to one, global |
I can imagine two different philosophies to implementing a Gibbs sampler:
My above proposal would essentially be doing 2., but using code that's very much like the new sampler, where the information about which sampler modifies which variables is in the sampler/ The reason I'm leaning towards 2. is that 1. seems to run to some fundamental issues in cases where either
Both of those situations quite deeply violate the idea that the different subsamplers can operate mostly independently of each other. Any thoughts very welcome, I'm still very much trying to understand the landscape of the problem. |
Thanks, @mhauru, for the excellent summary of the problem and proposals. Storing conditioned variables in a context, like In addition, it's worth mentioning that we currently have two mechanisms for passing observations to models, i.e. (1) via model arguments, e.g. Among these options, (1) will hardcode observation information directly in the model while (2) stores them in a context. You could look at the DynamicPPL codebase for a more detailed picture of how it works. We want to unify these options, perhaps towards using (2) only. This Gibbs refactoring could be an excellent starting point for a |
Overall, I'm also in favour of this @mhauru 👍 I think your reasoning is solid here. The only other "option" I'm seeing is to keep track of which variables correpond to which varinfos (with each varinfo only containing the relevant information), but then we're effectively just re-implementing a lot of the functionality that is already provided in The only "issue" is that this does mean we have to support this "link / transform only part of the Doulby however, I think we can make this much nicer than the current approach by simply making all these But yeah, don't see how we can take approach (1) in a "nice" way, and so I'm also in favour of just trying to make (2) as painless as possible to maintain. |
Thanks for the comments both, this is very helpful.
Yeah, I think this is the way to go. |
Co-authored-by: Tor Erlend Fjelde <[email protected]>
I'm done making the changes I had in mind. I may still experiment with some performance improvements, but not sure if any will make it in here. I'll also try to reduce the iteration counts in some tests to make them faster, the only CI failure is because one job just timed out at 6h. Since both Tor and I seem to be happy, I'm gonna ping others in case they want to take a look: @penelopeysm, @willtebbutt, @sunxd3, @yebai. I think we can rely on @torfjelde giving an expert review, everyone else can judge for themselves how thorough a look they want to take, but I think everyone should be at least aware that this, somewhat major, change is happening. If you want to give this PR a review but haven't yet had time, self-request a review and we'll make sure to wait before merging. For help in reviewing: This PR does a few things:
Points 4-6 one can reviewed like usual, as a diff of a few hundred lines. Points 2-3 I think are better viewed as a new Gibbs sampler from scratch. The changes in point 3 are so extensive that reading it as a diff doesn't make much sense unless you know the old code really well. |
I'm happy to take a look next week, but doubt I'll get to it today as my head is already several layers deep in DynamicPPL stuff 😄 |
I managed to decrease the iteration counts on a lot of the heaviest tests, the total runtime should be reduced substantially now. They seem to still pass somewhat robustly, i.e. I tried at least two random seeds. Also did some quick checks of performance overheads, and the previous large overheads are gone in my example cases. Now, rather than being e.g. 100-500% slower than the old Gibbs we are more like 0-50% slower. This for models dominated by overheads from outside model evaluation, i.e. fast models where performance is not a big deal. |
The Mooncake stack overflows are something @willtebbutt is aware of and knows the reason for, so we can ignore them for now. Would still hold off from merging until they are fixed. |
@penelopeysm, can you help resolve the merge conflicts so we can try to merge this before the new year? |
@yebai Sure! Are we happy otherwise with the PR, i.e. if conflicts are fixed and CI passes we can merge? |
I think so. |
@yebai CI pretty much passes fine, apart from:
One of them is in the Gibbs tests, on the dynamic Chinese restaurant process model. This test is slightly dubious anyway imo Lines 475 to 485 in 96f8dd4
The other one is in ESS:
Personally I don't think that either of these are serious enough to prevent us from merging this PR. I reckon that both should be tracked via new issues. If you agree, feel free to hit the button 😄 |
This is likely a Libtask issue on Julia 1.11. Hopefully, we will resolve this in #2427. cc @willtebbutt EDIT: it is slightly odd that Gibbs runs faster on the master branch for Julia 1.11 branch before this PR. @penelopeysm can you open issues to track the other minor numerical issues on X86? This is likely due to an insufficient number of MCMC iterations. |
Many thanks to @mhauru, @torfjelde, @penelopeysm, and all who helped! |
🎉🎉 |
Closes #2318.
Work in progress.