-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gibbs performance regression on Julia v1.11 #2445
Comments
Could you point me to a CI run where this is evident? I'm looking at this one as an example for the pre-merge times, and the times for experimental seem equally long (and wildly variable): https://github.com/TuringLang/Turing.jl/actions/runs/12325358448/job/34404526010 |
There's definitely something weird going on between Julia v1.10 and v1.11 though, the latter being 2-3 times slower in cases I'm looking at. @yebai, you mentioned that you think this is due to a Libtask issue. Is the root of the problem known? Would it still be helpful to narrow this down, to confirm that if we leave out all the particle Gibbs tests then v1.10 and v1.11 become comparable in runtime? |
I decided that CI is not to be trusted for timings, and ran the test suite using Julia v1.10 locally on my M1 Mac, both on current master and the last commit before the new Gibbs got merged. I only ran the
This seemed sensible, i.e. new Gibbs a bit slower but nothing dramatic. Note that some new tests were also added in adding the new Gibbs, so this isn't even all slowdown. To see how much of this is compilation I decided to repeat the runs in the same Julia sessions. This time,
So now I'm confused. Sure, I was running both test suites in parallel and doing other work at the same time, but both Julia sessions were only using a single core, and I wasn't doing anything heavy on the side. My laptop has 16G of RAM, which I guess it could have run out of at some point. Might try doing the same on Julia v1.11 tomorrow, and maybe try leaving out all tests that use particle methods. Note that I think the new Gibbs is ready to go, we just need to make a release. However, I would like to first make sure that there isn't some awful performance regression that we would have missed, hence this. |
Okay, so there seems to be an across the board huge slow down when using the new Gibbs with HMC on Julia v1.11. Case in point: module MWE
using Turing
using Turing: DynamicPPL
using Random
Random.seed!(42)
num_iterations = 10_000
adbackend = AutoForwardDiff()
@model function m(x=1.5)
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
x ~ Normal(m, s)
return nothing
end
model = m()
initial_params = [0.5, 0.5]
component_sampler = HMC(0.1, 32; adtype=adbackend)
sampler = Turing.Gibbs(@varname(s) => component_sampler, @varname(m) => component_sampler)
@info "Starting sampling"
chain = sample(
model, sampler, num_iterations; initial_params=initial_params
)
end On v1.10 this runs in about 4s, on v1.11 in about 30s. The difference only gets worse with more samples, i.e. this is not overheads, this is every iteration being slow. This happens
This does not happen
|
Julia issue: JuliaLang/julia#57028 |
Did some profiling, the culprit is Lines 460 to 486 in 7d6f8ed
What I don't entirely understand is that if I unroll the for loop manually, i.e. replace for index in 1:length(samplers)
# Take the inner step.
sampler_local = samplers[index]
state_local = states[index]
varnames_local = varnames[index]
vi, new_state_local = gibbs_step_inner(
rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
)
states = Accessors.setindex(states, new_state_local, index)
end with sampler_local = samplers[1]
state_local = states[1]
varnames_local = varnames[1]
vi, new_state_local = gibbs_step_inner(
rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
)
states = Accessors.setindex(states, new_state_local, 1)
sampler_local = samplers[2]
state_local = states[2]
varnames_local = varnames[2]
vi, new_state_local = gibbs_step_inner(
rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
)
states = Accessors.setindex(states, new_state_local, 2) the slowdown vanishes. ## 1.11 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 43 samples with 1 evaluation
min 2.364 ms (3329 allocs: 148.078 KiB)
median 2.371 ms (3329 allocs: 148.078 KiB)
mean 2.377 ms (3329.21 allocs: 148.296 KiB)
max 2.496 ms (3331 allocs: 150.109 KiB)
## 1.10 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 569 samples with 1 evaluation
min 155.333 μs (2170 allocs: 152.656 KiB)
median 158.125 μs (2170 allocs: 152.656 KiB)
mean 169.251 μs (2170.47 allocs: 153.155 KiB, 0.17% gc time)
max 5.924 ms (2171 allocs: 153.719 KiB, 94.48% gc time)
## 1.11 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 613 samples with 1 evaluation
min 132.458 μs (3329 allocs: 148.078 KiB)
median 136.375 μs (3329 allocs: 148.078 KiB)
mean 157.633 μs (3329.25 allocs: 148.332 KiB, 0.16% gc time)
max 12.649 ms (3332 allocs: 151.172 KiB, 97.21% gc time)
## 1.10 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 566 samples with 1 evaluation
min 156.583 μs (2170 allocs: 152.656 KiB)
median 159.459 μs (2170 allocs: 152.656 KiB)
mean 170.477 μs (2170.48 allocs: 153.165 KiB, 0.17% gc time)
max 5.886 ms (2171 allocs: 153.719 KiB, 94.53% gc time) |
Type-instability? 🤔 |
Since #2328 was merged, the time it takes to run
mcmc/gibbs
is far longer than the time it used to take to runmcmc/gibbs
plusexperimental/gibbs
.Need to investigate the cause for this.
The text was updated successfully, but these errors were encountered: