Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibbs performance regression on Julia v1.11 #2445

Closed
penelopeysm opened this issue Dec 19, 2024 · 7 comments · Fixed by #2464
Closed

Gibbs performance regression on Julia v1.11 #2445

penelopeysm opened this issue Dec 19, 2024 · 7 comments · Fixed by #2464
Assignees

Comments

@penelopeysm
Copy link
Member

Since #2328 was merged, the time it takes to run mcmc/gibbs is far longer than the time it used to take to run mcmc/gibbs plus experimental/gibbs.

Need to investigate the cause for this.

@mhauru mhauru self-assigned this Jan 6, 2025
@mhauru
Copy link
Member

mhauru commented Jan 7, 2025

Could you point me to a CI run where this is evident? I'm looking at this one as an example for the pre-merge times, and the times for experimental seem equally long (and wildly variable): https://github.com/TuringLang/Turing.jl/actions/runs/12325358448/job/34404526010

@mhauru
Copy link
Member

mhauru commented Jan 7, 2025

There's definitely something weird going on between Julia v1.10 and v1.11 though, the latter being 2-3 times slower in cases I'm looking at. @yebai, you mentioned that you think this is due to a Libtask issue. Is the root of the problem known? Would it still be helpful to narrow this down, to confirm that if we leave out all the particle Gibbs tests then v1.10 and v1.11 become comparable in runtime?

@mhauru
Copy link
Member

mhauru commented Jan 9, 2025

I decided that CI is not to be trusted for timings, and ran the test suite using Julia v1.10 locally on my M1 Mac, both on current master and the last commit before the new Gibbs got merged. I only ran the mcmc/gibbs.jl, mcmc/gibbs_conditional.jl, and experimental/gibbs.jl files. Times

  • Old Gibbs: 3556s
  • New Gibbs: 4784s

This seemed sensible, i.e. new Gibbs a bit slower but nothing dramatic. Note that some new tests were also added in adding the new Gibbs, so this isn't even all slowdown.

To see how much of this is compilation I decided to repeat the runs in the same Julia sessions. This time,

  • Old Gibbs: 7006s
  • New Gibbs: 7287s

So now I'm confused. Sure, I was running both test suites in parallel and doing other work at the same time, but both Julia sessions were only using a single core, and I wasn't doing anything heavy on the side. My laptop has 16G of RAM, which I guess it could have run out of at some point.

Might try doing the same on Julia v1.11 tomorrow, and maybe try leaving out all tests that use particle methods.

Note that I think the new Gibbs is ready to go, we just need to make a release. However, I would like to first make sure that there isn't some awful performance regression that we would have missed, hence this.

@mhauru
Copy link
Member

mhauru commented Jan 13, 2025

Okay, so there seems to be an across the board huge slow down when using the new Gibbs with HMC on Julia v1.11. Case in point:

module MWE

using Turing
using Turing: DynamicPPL
using Random

Random.seed!(42)

num_iterations = 10_000
adbackend = AutoForwardDiff()

@model function m(x=1.5)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ Normal(m, s)
    return nothing
end

model = m()
initial_params = [0.5, 0.5]

component_sampler = HMC(0.1, 32; adtype=adbackend)
sampler = Turing.Gibbs(@varname(s) => component_sampler, @varname(m) => component_sampler)

@info "Starting sampling"
chain = sample(
    model, sampler, num_iterations; initial_params=initial_params
)

end

On v1.10 this runs in about 4s, on v1.11 in about 30s. The difference only gets worse with more samples, i.e. this is not overheads, this is every iteration being slow.

This happens

  • for a range of models, e.g. all or almost all of our DynamicPPL test models
  • for all AD backends

This does not happen

  • when using the same HMC sampler not wrapped in Gibbs
  • when using MH instead of HMC

@mhauru mhauru changed the title Gibbs test slowdown Gibbs performance regression on Julia v1.11 Jan 13, 2025
@mhauru mhauru added performance and removed tests labels Jan 13, 2025
@mhauru
Copy link
Member

mhauru commented Jan 13, 2025

Julia issue: JuliaLang/julia#57028

@penelopeysm
Copy link
Member Author

penelopeysm commented Jan 13, 2025

Did some profiling, the culprit is step(rng, model, spl, state) i.e. the second MCMC step onwards:

Turing.jl/src/mcmc/gibbs.jl

Lines 460 to 486 in 7d6f8ed

function AbstractMCMC.step(
rng::Random.AbstractRNG,
model::DynamicPPL.Model,
spl::DynamicPPL.Sampler{<:Gibbs},
state::GibbsState;
kwargs...,
)
vi = varinfo(state)
alg = spl.alg
varnames = alg.varnames
samplers = alg.samplers
states = state.states
@assert length(samplers) == length(state.states)
# TODO: move this into a recursive function so we can unroll when reasonable?
for index in 1:length(samplers)
# Take the inner step.
sampler_local = samplers[index]
state_local = states[index]
varnames_local = varnames[index]
vi, new_state_local = gibbs_step_inner(
rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
)
states = Accessors.setindex(states, new_state_local, index)
end
return Transition(model, vi), GibbsState(vi, states)
end

What I don't entirely understand is that if I unroll the for loop manually, i.e. replace

    for index in 1:length(samplers)
        # Take the inner step.
        sampler_local = samplers[index]
        state_local = states[index]
        varnames_local = varnames[index]
        vi, new_state_local = gibbs_step_inner(
            rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
        )
        states = Accessors.setindex(states, new_state_local, index)
    end

with

    sampler_local = samplers[1]
    state_local = states[1]
    varnames_local = varnames[1]
    vi, new_state_local = gibbs_step_inner(
        rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
    )
    states = Accessors.setindex(states, new_state_local, 1)
    sampler_local = samplers[2]
    state_local = states[2]
    varnames_local = varnames[2]
    vi, new_state_local = gibbs_step_inner(
        rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
    )
    states = Accessors.setindex(states, new_state_local, 2)

the slowdown vanishes.

## 1.11 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 43 samples with 1 evaluation
 min    2.364 ms (3329 allocs: 148.078 KiB)
 median 2.371 ms (3329 allocs: 148.078 KiB)
 mean   2.377 ms (3329.21 allocs: 148.296 KiB)
 max    2.496 ms (3331 allocs: 150.109 KiB)

## 1.10 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 569 samples with 1 evaluation
 min    155.333 μs (2170 allocs: 152.656 KiB)
 median 158.125 μs (2170 allocs: 152.656 KiB)
 mean   169.251 μs (2170.47 allocs: 153.155 KiB, 0.17% gc time)
 max    5.924 ms (2171 allocs: 153.719 KiB, 94.48% gc time)

## 1.11 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 613 samples with 1 evaluation
 min    132.458 μs (3329 allocs: 148.078 KiB)
 median 136.375 μs (3329 allocs: 148.078 KiB)
 mean   157.633 μs (3329.25 allocs: 148.332 KiB, 0.16% gc time)
 max    12.649 ms (3332 allocs: 151.172 KiB, 97.21% gc time)

## 1.10 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 566 samples with 1 evaluation
 min    156.583 μs (2170 allocs: 152.656 KiB)
 median 159.459 μs (2170 allocs: 152.656 KiB)
 mean   170.477 μs (2170.48 allocs: 153.165 KiB, 0.17% gc time)
 max    5.886 ms (2171 allocs: 153.719 KiB, 94.53% gc time)

@torfjelde
Copy link
Member

Type-instability? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants