Gibbs performance regression on Julia v1.11 #2445

penelopeysm · 2024-12-19T12:26:38Z

Since #2328 was merged, the time it takes to run mcmc/gibbs is far longer than the time it used to take to run mcmc/gibbs plus experimental/gibbs.

Need to investigate the cause for this.

The text was updated successfully, but these errors were encountered:

mhauru · 2025-01-07T14:27:05Z

Could you point me to a CI run where this is evident? I'm looking at this one as an example for the pre-merge times, and the times for experimental seem equally long (and wildly variable): https://github.com/TuringLang/Turing.jl/actions/runs/12325358448/job/34404526010

mhauru · 2025-01-07T14:51:23Z

There's definitely something weird going on between Julia v1.10 and v1.11 though, the latter being 2-3 times slower in cases I'm looking at. @yebai, you mentioned that you think this is due to a Libtask issue. Is the root of the problem known? Would it still be helpful to narrow this down, to confirm that if we leave out all the particle Gibbs tests then v1.10 and v1.11 become comparable in runtime?

mhauru · 2025-01-09T17:18:14Z

I decided that CI is not to be trusted for timings, and ran the test suite using Julia v1.10 locally on my M1 Mac, both on current master and the last commit before the new Gibbs got merged. I only ran the mcmc/gibbs.jl, mcmc/gibbs_conditional.jl, and experimental/gibbs.jl files. Times

Old Gibbs: 3556s
New Gibbs: 4784s

This seemed sensible, i.e. new Gibbs a bit slower but nothing dramatic. Note that some new tests were also added in adding the new Gibbs, so this isn't even all slowdown.

To see how much of this is compilation I decided to repeat the runs in the same Julia sessions. This time,

Old Gibbs: 7006s
New Gibbs: 7287s

So now I'm confused. Sure, I was running both test suites in parallel and doing other work at the same time, but both Julia sessions were only using a single core, and I wasn't doing anything heavy on the side. My laptop has 16G of RAM, which I guess it could have run out of at some point.

Might try doing the same on Julia v1.11 tomorrow, and maybe try leaving out all tests that use particle methods.

Note that I think the new Gibbs is ready to go, we just need to make a release. However, I would like to first make sure that there isn't some awful performance regression that we would have missed, hence this.

mhauru · 2025-01-13T11:06:04Z

Okay, so there seems to be an across the board huge slow down when using the new Gibbs with HMC on Julia v1.11. Case in point:

module MWE

using Turing
using Turing: DynamicPPL
using Random

Random.seed!(42)

num_iterations = 10_000
adbackend = AutoForwardDiff()

@model function m(x=1.5)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ Normal(m, s)
    return nothing
end

model = m()
initial_params = [0.5, 0.5]

component_sampler = HMC(0.1, 32; adtype=adbackend)
sampler = Turing.Gibbs(@varname(s) => component_sampler, @varname(m) => component_sampler)

@info "Starting sampling"
chain = sample(
    model, sampler, num_iterations; initial_params=initial_params
)

end

On v1.10 this runs in about 4s, on v1.11 in about 30s. The difference only gets worse with more samples, i.e. this is not overheads, this is every iteration being slow.

This happens

for a range of models, e.g. all or almost all of our DynamicPPL test models
for all AD backends

This does not happen

when using the same HMC sampler not wrapped in Gibbs
when using MH instead of HMC

mhauru · 2025-01-13T11:36:58Z

Julia issue: JuliaLang/julia#57028

penelopeysm · 2025-01-13T12:11:08Z

Did some profiling, the culprit is step(rng, model, spl, state) i.e. the second MCMC step onwards:

Turing.jl/src/mcmc/gibbs.jl

Lines 460 to 486 in 7d6f8ed

    
           function AbstractMCMC.step( 
        
               rng::Random.AbstractRNG, 
        
               model::DynamicPPL.Model, 
        
               spl::DynamicPPL.Sampler{<:Gibbs}, 
        
               state::GibbsState; 
        
               kwargs..., 
        
           ) 
        
               vi = varinfo(state) 
        
               alg = spl.alg 
        
               varnames = alg.varnames 
        
               samplers = alg.samplers 
        
               states = state.states 
        
               @assert length(samplers) == length(state.states) 
        
               # TODO: move this into a recursive function so we can unroll when reasonable? 
        
               for index in 1:length(samplers) 
        
                   # Take the inner step. 
        
                   sampler_local = samplers[index] 
        
                   state_local = states[index] 
        
                   varnames_local = varnames[index] 
        
                   vi, new_state_local = gibbs_step_inner( 
        
                       rng, model, varnames_local, sampler_local, state_local, vi; kwargs... 
        
                   ) 
        
                   states = Accessors.setindex(states, new_state_local, index) 
        
               end 
        
               return Transition(model, vi), GibbsState(vi, states) 
        
           end

What I don't entirely understand is that if I unroll the for loop manually, i.e. replace

    for index in 1:length(samplers)
        # Take the inner step.
        sampler_local = samplers[index]
        state_local = states[index]
        varnames_local = varnames[index]
        vi, new_state_local = gibbs_step_inner(
            rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
        )
        states = Accessors.setindex(states, new_state_local, index)
    end

with

    sampler_local = samplers[1]
    state_local = states[1]
    varnames_local = varnames[1]
    vi, new_state_local = gibbs_step_inner(
        rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
    )
    states = Accessors.setindex(states, new_state_local, 1)
    sampler_local = samplers[2]
    state_local = states[2]
    varnames_local = varnames[2]
    vi, new_state_local = gibbs_step_inner(
        rng, model, varnames_local, sampler_local, state_local, vi; kwargs...
    )
    states = Accessors.setindex(states, new_state_local, 2)

the slowdown vanishes.

## 1.11 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 43 samples with 1 evaluation
 min    2.364 ms (3329 allocs: 148.078 KiB)
 median 2.371 ms (3329 allocs: 148.078 KiB)
 mean   2.377 ms (3329.21 allocs: 148.296 KiB)
 max    2.496 ms (3331 allocs: 150.109 KiB)

## 1.10 for loop
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 569 samples with 1 evaluation
 min    155.333 μs (2170 allocs: 152.656 KiB)
 median 158.125 μs (2170 allocs: 152.656 KiB)
 mean   169.251 μs (2170.47 allocs: 153.155 KiB, 0.17% gc time)
 max    5.924 ms (2171 allocs: 153.719 KiB, 94.48% gc time)

## 1.11 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 613 samples with 1 evaluation
 min    132.458 μs (3329 allocs: 148.078 KiB)
 median 136.375 μs (3329 allocs: 148.078 KiB)
 mean   157.633 μs (3329.25 allocs: 148.332 KiB, 0.16% gc time)
 max    12.649 ms (3332 allocs: 151.172 KiB, 97.21% gc time)

## 1.10 explicit indices
julia> @be AbstractMCMC.step(Xoshiro(468), m(), DynamicPPL.Sampler(spl), state)
Benchmark: 566 samples with 1 evaluation
 min    156.583 μs (2170 allocs: 152.656 KiB)
 median 159.459 μs (2170 allocs: 152.656 KiB)
 mean   170.477 μs (2170.48 allocs: 153.165 KiB, 0.17% gc time)
 max    5.886 ms (2171 allocs: 153.719 KiB, 94.53% gc time)

torfjelde · 2025-01-13T13:09:45Z

Type-instability? 🤔

penelopeysm added the tests label Dec 19, 2024

mhauru self-assigned this Jan 6, 2025

mhauru changed the title ~~Gibbs test slowdown~~ Gibbs performance regression on Julia v1.11 Jan 13, 2025

mhauru added performance and removed tests labels Jan 13, 2025

This was referenced Jan 13, 2025

Add tests for type stability TuringLang/DynamicPPL.jl#777

Open

Replace Gibbs inner loop with recursion #2464

Merged

mhauru closed this as completed in #2464 Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gibbs performance regression on Julia v1.11 #2445

Gibbs performance regression on Julia v1.11 #2445

penelopeysm commented Dec 19, 2024

mhauru commented Jan 7, 2025

mhauru commented Jan 7, 2025

mhauru commented Jan 9, 2025

mhauru commented Jan 13, 2025

mhauru commented Jan 13, 2025

penelopeysm commented Jan 13, 2025 •

edited

Loading

torfjelde commented Jan 13, 2025

Gibbs performance regression on Julia v1.11 #2445

Gibbs performance regression on Julia v1.11 #2445

Comments

penelopeysm commented Dec 19, 2024

mhauru commented Jan 7, 2025

mhauru commented Jan 7, 2025

mhauru commented Jan 9, 2025

mhauru commented Jan 13, 2025

mhauru commented Jan 13, 2025

penelopeysm commented Jan 13, 2025 • edited Loading

torfjelde commented Jan 13, 2025

penelopeysm commented Jan 13, 2025 •

edited

Loading