-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enzyme segfaults on Turing model #650
Comments
@sethaxen can you extract this out so we can see the function being passed to autodiff? As is it's hard to see what function is being differentiated, in order to debug. |
@sethaxen this runs correctly for me on Enzyme#main and Julia 1.9. First run (for compile), and second below
|
Sure, here's the version that contains the call to using Turing, Enzyme
using Turing.LogDensityProblems
@model function model()
m ~ Normal(0, 1)
s ~ InverseGamma()
x ~ Normal(m, s)
end
mod = model()
sampler = DynamicPPL.Sampler(NUTS())
vi = DynamicPPL.VarInfo(mod)
vi = DynamicPPL.link!!(vi, sampler, mod)
ℓ = Turing.LogDensityFunction(vi, mod, sampler, DynamicPPL.DefaultContext())
x = vi[sampler] # Vector{Float64}
∂ℓ_∂x = zero(x)
Enzyme.autodiff(
Reverse,
LogDensityProblems.logdensity,
Enzyme.Active,
Enzyme.Const(ℓ),
Enzyme.Duplicated(x, ∂ℓ_∂x),
)
Strange, because this also segfaults for me on Julia 1.9. |
Odd, okay, would you be able to simplify the above? E.g. simplify and/or inline as much as possible? |
No, sorry, I am not familiar with the inner workings of this code and have no time right now. |
No -- this issue is already fixed by using an immutable internal data structure ( PS. Here is a working example of Turing using Enzyme:
|
No, this is not fixed using |
As another data point, the example in #650 (comment) segfaults for me with Enzyme#main, EnzymeCore#main and Enzyme_jll#main on Julia 1.9 rc1. Using julia> Enzyme.autodiff(
Reverse,
LogDensityProblems.logdensity,
Enzyme.Active,
Enzyme.Const(ℓ),
Enzyme.Duplicated(x, ∂ℓ_∂x),
)
ERROR: Return type inferred to be Union{}. Giving up.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] #s479#163
@ ~/.julia/packages/Enzyme/SUstD/src/compiler.jl:8160 [inlined]
[3] var"#s479#163"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ShadowInit::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler ./none:0
[4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[5] thunk(f::typeof(LogDensityProblems.logdensity), df::Nothing, ::Type{Duplicated{Union{}}}, tt::Type{Tuple{Const{LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}}, Duplicated{Vector{Float64}}}}, ::Val{Enzyme.API.DEM_ReverseModeGradient}, ::Val{1}, ::Val{(false, false, false)}, ::Val{false}, ::Val{true})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/SUstD/src/compiler.jl:8218
[6] autodiff(::EnzymeCore.ReverseMode{false, false}, ::typeof(LogDensityProblems.logdensity), ::Type{Active}, ::Const{LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}}, ::Vararg{Any})
@ Enzyme ~/.julia/packages/Enzyme/SUstD/src/Enzyme.jl:185
[7] top-level scope
@ REPL[10]:1 Surprisingly it seems the return type of the logdensity function can't be inferred even though we work with a simple julia> @code_warntype LogDensityProblems.logdensity(ℓ, x)
MethodInstance for LogDensityProblems.logdensity(::LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}, ::Vector{Float64})
from logdensity(f::LogDensityFunction, θ::AbstractVector) @ DynamicPPL ~/.julia/packages/DynamicPPL/UFajj/src/logdensityfunction.jl:92
Arguments
#self#::Core.Const(LogDensityProblems.logdensity)
f::LogDensityFunction{DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}}
θ::Vector{Float64}
Locals
vi_new::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}
Body::Union{}
1 ─ %1 = Base.getproperty(f, :varinfo)::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}
│ %2 = Base.getproperty(f, :context)::DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}
│ (vi_new = DynamicPPL.unflatten(%1, %2, θ))
│ %4 = Base.getproperty(f, :model)::Core.Const(DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}(model, NamedTuple(), NamedTuple(), DynamicPPL.DefaultContext()))
│ %5 = vi_new::DynamicPPL.SimpleVarInfo{NamedTuple{(:m, :s, :x), Tuple{Float64, Float64, Float64}}, Float64, DynamicPPL.DynamicTransformation}
│ %6 = Base.getproperty(f, :context)::DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random._GLOBAL_RNG}
│ DynamicPPL.evaluate!!(%4, %5, %6)
│ Core.Const(:(DynamicPPL.last(%7)))
│ Core.Const(:(DynamicPPL.getlogp(%8)))
└── Core.Const(:(return %9)) |
What does execution the function normally yield? |
It should return a ℓ = Turing.LogDensityFunction(mod, vi, DynamicPPL.DefaultContext()) instead. Then the logdensity can be evaluated correctly and Enzyme can compute the gradient (when using julia> autodiff(
ReverseWithPrimal,
LogDensityProblems.logdensity,
ℓ,
Duplicated(x, ∂ℓ_∂x),
)
((nothing, nothing), -4.2993310577423145)
julia> ∂ℓ_∂x
3-element Vector{Float64}:
0.4264673357364165
-0.6593350799670791
0.5109041418560486
julia> LogDensityProblems.logdensity(ℓ, x)
-4.2993310577423145
julia> ForwardDiff.gradient(x -> logjoint(mod, DynamicPPL.SimpleVarInfo((m = x[1], s = x[2], x = x[3]), zero(eltype(x)), DynamicPPL.DynamicTransformation())), x)
3-element Vector{Float64}:
0.4264673357364165
-0.6593350799670789
0.5109041418560486 |
@devmotion well unfortunately as I cannot reproduce the segfault on main, you're going to have to minimize it (and hopefully therefore allow me to reproduce it), in order to start any investigation and/or fix. |
I don't understand how it's possible that you can successfully run the example in the OP. For me, the same happens as @sethaxen described above: When I run using Turing, Enzyme
@model function model()
m ~ Normal(0, 1)
s ~ InverseGamma()
x ~ Normal(m, s)
end
sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10) I get a lot of warnings and then Julia segfaults. I used the latest version of Enzyme, Julia, and the Turing branch with Enzyme support: julia> versioninfo()
Julia Version 1.9.0-rc1
Commit 3b2e0d8fbc1 (2023-03-07 07:51 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
Threads: 3 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = 3
JULIA_PKG_USE_CLI_GIT = true
JULIA_EDITOR = code
JULIA_PKG_SERVER = https://pumasai.juliahub.com
(enzyme) pkg> st
Status `~/sources/enzyme/Project.toml`
[7da242da] Enzyme v0.11.0-dev `https://github.com/EnzymeAD/Enzyme.jl.git#main`
[f151be2c] EnzymeCore v0.2.1 `https://github.com/EnzymeAD/Enzyme.jl.git:lib/EnzymeCore#main`
[fce5fe82] Turing v0.24.1 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`
[7cc45869] Enzyme_jll v0.0.51+0 `https://github.com/JuliaBinaryWrappers/Enzyme_jll.jl.git#main` Does your setup in some way differ from ours? |
I found it 🎉 I had a final idea, based on the differences between sampling/computing derivatives with a single thread and with multiple threads (or the respective methods for these cases) that I had observed in #659. And indeed, when I erase the ┌ Warning: TypeAnalysisDepthLimit
│ {[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [8]:Pointer, [8,0]:Pointer, [8,0,0]:Pointer, [8,8]:Integer, [8,9]:Integer, [8,10]:Integer, [8,11]:Integer, [8,12]:Integer, [8,13]:Integer, [8,14]:Integer, [8,15]:Integer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [8,24]:Integer, [8,25]:Integer, [8,26]:Integer, [8,27]:Integer, [8,28]:Integer, [8,29]:Integer, [8,30]:Integer, [8,31]:Integer, [8,32]:Integer, [8,33]:Integer, [8,34]:Integer, [8,35]:Integer, [8,36]:Integer, [8,37]:Integer, [8,38]:Integer, [8,39]:Integer, [8,40]:Integer, [16]:Pointer, [16,0]:Pointer, [16,0,0]:Pointer, [16,0,0,0]:Pointer, [16,0,0,0,0]:Pointer, [16,0,0,0,0,0]:Integer, [16,0,0,0,0,1]:Integer, [16,0,0,0,0,2]:Integer, [16,0,0,0,0,3]:Integer, [16,0,0,0,0,4]:Integer, [16,0,0,0,0,5]:Integer, [16,0,0,0,0,6]:Integer, [16,0,0,0,0,7]:Integer, [16,0,0,0,8]:Integer, [16,0,0,0,9]:Integer, [16,0,0,0,10]:Integer, [16,0,0,0,11]:Integer, [16,0,0,0,12]:Integer, [16,0,0,0,13]:Integer, [16,0,0,0,14]:Integer, [16,0,0,0,15]:Integer, [16,0,0,0,16]:Integer, [16,0,0,0,17]:Integer, [16,0,0,0,18]:Integer, [16,0,0,0,19]:Integer, [16,0,0,0,20]:Integer, [16,0,0,0,21]:Integer, [16,0,0,0,22]:Integer, [16,0,0,0,23]:Integer, [16,0,0,0,24]:Integer, [16,0,0,0,25]:Integer, [16,0,0,0,26]:Integer, [16,0,0,0,27]:Integer, [16,0,0,0,28]:Integer, [16,0,0,0,29]:Integer, [16,0,0,0,30]:Integer, [16,0,0,0,31]:Integer, [16,0,0,0,32]:Integer, [16,0,0,0,33]:Integer, [16,0,0,0,34]:Integer, [16,0,0,0,35]:Integer, [16,0,0,0,36]:Integer, [16,0,0,0,37]:Integer, [16,0,0,0,38]:Integer, [16,0,0,0,39]:Integer, [16,0,0,0,40]:Integer, [16,0,0,8]:Integer, [16,0,0,9]:Integer, [16,0,0,10]:Integer, [16,0,0,11]:Integer, [16,0,0,12]:Integer, [16,0,0,13]:Integer, [16,0,0,14]:Integer, [16,0,0,15]:Integer, [16,0,0,16]:Integer, [16,0,0,17]:Integer, [16,0,0,18]:Integer, [16,0,0,19]:Integer, [16,0,0,20]:Integer, [16,0,0,21]:Integer, [16,0,0,22]:Integer, [16,0,0,23]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Integer, [16,33]:Integer, [16,34]:Integer, [16,35]:Integer, [16,36]:Integer, [16,37]:Integer, [16,38]:Integer, [16,39]:Integer, [16,40]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Integer, [41]:Integer, [42]:Integer, [43]:Integer, [44]:Integer, [45]:Integer, [46]:Integer, [47]:Integer, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer}
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:50
not handling more than 6 pointer lookups deep dt:{[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [8]:Pointer, [8,0]:Pointer, [8,0,0]:Pointer, [8,8]:Integer, [8,9]:Integer, [8,10]:Integer, [8,11]:Integer, [8,12]:Integer, [8,13]:Integer, [8,14]:Integer, [8,15]:Integer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [8,24]:Integer, [8,25]:Integer, [8,26]:Integer, [8,27]:Integer, [8,28]:Integer, [8,29]:Integer, [8,30]:Integer, [8,31]:Integer, [8,32]:Integer, [8,33]:Integer, [8,34]:Integer, [8,35]:Integer, [8,36]:Integer, [8,37]:Integer, [8,38]:Integer, [8,39]:Integer, [8,40]:Integer, [16]:Pointer, [16,0]:Pointer, [16,0,0]:Pointer, [16,0,0,0]:Pointer, [16,0,0,0,0]:Pointer, [16,0,0,0,0,0]:Integer, [16,0,0,0,0,1]:Integer, [16,0,0,0,0,2]:Integer, [16,0,0,0,0,3]:Integer, [16,0,0,0,0,4]:Integer, [16,0,0,0,0,5]:Integer, [16,0,0,0,0,6]:Integer, [16,0,0,0,0,7]:Integer, [16,0,0,0,8]:Integer, [16,0,0,0,9]:Integer, [16,0,0,0,10]:Integer, [16,0,0,0,11]:Integer, [16,0,0,0,12]:Integer, [16,0,0,0,13]:Integer, [16,0,0,0,14]:Integer, [16,0,0,0,15]:Integer, [16,0,0,0,16]:Integer, [16,0,0,0,17]:Integer, [16,0,0,0,18]:Integer, [16,0,0,0,19]:Integer, [16,0,0,0,20]:Integer, [16,0,0,0,21]:Integer, [16,0,0,0,22]:Integer, [16,0,0,0,23]:Integer, [16,0,0,0,24]:Integer, [16,0,0,0,25]:Integer, [16,0,0,0,26]:Integer, [16,0,0,0,27]:Integer, [16,0,0,0,28]:Integer, [16,0,0,0,29]:Integer, [16,0,0,0,30]:Integer, [16,0,0,0,31]:Integer, [16,0,0,0,32]:Integer, [16,0,0,0,33]:Integer, [16,0,0,0,34]:Integer, [16,0,0,0,35]:Integer, [16,0,0,0,36]:Integer, [16,0,0,0,37]:Integer, [16,0,0,0,38]:Integer, [16,0,0,0,39]:Integer, [16,0,0,0,40]:Integer, [16,0,0,8]:Integer, [16,0,0,9]:Integer, [16,0,0,10]:Integer, [16,0,0,11]:Integer, [16,0,0,12]:Integer, [16,0,0,13]:Integer, [16,0,0,14]:Integer, [16,0,0,15]:Integer, [16,0,0,16]:Integer, [16,0,0,17]:Integer, [16,0,0,18]:Integer, [16,0,0,19]:Integer, [16,0,0,20]:Integer, [16,0,0,21]:Integer, [16,0,0,22]:Integer, [16,0,0,23]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Integer, [16,33]:Integer, [16,34]:Integer, [16,35]:Integer, [16,36]:Integer, [16,37]:Integer, [16,38]:Integer, [16,39]:Integer, [16,40]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Integer, [41]:Integer, [42]:Integer, [43]:Integer, [44]:Integer, [45]:Integer, [46]:Integer, [47]:Integer, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer} only(56):
┌ Warning: TypeAnalysisDepthLimit
│ store {} addrspace(10)* %.fca.0.0.1.7.extract, {} addrspace(10)* addrspace(10)* %.fca.0.0.1.7.gep, align 8, !dbg !19
│ {[]:Pointer, [0]:Pointer, [0,0]:Pointer, [0,0,0]:Pointer, [0,0,0,0]:Integer, [0,0,8]:Integer, [0,0,9]:Integer, [0,0,10]:Integer, [0,0,11]:Integer, [0,0,12]:Integer, [0,0,13]:Integer, [0,0,14]:Integer, [0,0,15]:Integer, [0,0,16]:Integer, [0,0,17]:Integer, [0,0,18]:Integer, [0,0,19]:Integer, [0,0,20]:Integer, [0,0,21]:Integer, [0,0,22]:Integer, [0,0,23]:Integer, [0,0,24]:Integer, [0,0,25]:Integer, [0,0,26]:Integer, [0,0,27]:Integer, [0,0,28]:Integer, [0,0,29]:Integer, [0,0,30]:Integer, [0,0,31]:Integer, [0,0,32]:Integer, [0,0,33]:Integer, [0,0,34]:Integer, [0,0,35]:Integer, [0,0,36]:Integer, [0,0,37]:Integer, [0,0,38]:Integer, [0,0,39]:Integer, [0,0,40]:Integer, [0,8]:Pointer, [0,8,0]:Pointer, [0,8,0,0]:Pointer, [0,8,8]:Integer, [0,8,9]:Integer, [0,8,10]:Integer, [0,8,11]:Integer, [0,8,12]:Integer, [0,8,13]:Integer, [0,8,14]:Integer, [0,8,15]:Integer, [0,8,16]:Integer, [0,8,17]:Integer, [0,8,18]:Integer, [0,8,19]:Integer, [0,8,20]:Integer, [0,8,21]:Integer, [0,8,22]:Integer, [0,8,23]:Integer, [0,8,24]:Integer, [0,8,25]:Integer, [0,8,26]:Integer, [0,8,27]:Integer, [0,8,28]:Integer, [0,8,29]:Integer, [0,8,30]:Integer, [0,8,31]:Integer, [0,8,32]:Integer, [0,8,33]:Integer, [0,8,34]:Integer, [0,8,35]:Integer, [0,8,36]:Integer, [0,8,37]:Integer, [0,8,38]:Integer, [0,8,39]:Integer, [0,8,40]:Integer, [0,16]:Pointer, [0,16,0]:Pointer, [0,16,0,0]:Pointer, [0,16,0,0,0]:Pointer, [0,16,0,0,0,0]:Pointer, [0,16,0,0,0,8]:Integer, [0,16,0,0,0,9]:Integer, [0,16,0,0,0,10]:Integer, [0,16,0,0,0,11]:Integer, [0,16,0,0,0,12]:Integer, [0,16,0,0,0,13]:Integer, [0,16,0,0,0,14]:Integer, [0,16,0,0,0,15]:Integer, [0,16,0,0,0,16]:Integer, [0,16,0,0,0,17]:Integer, [0,16,0,0,0,18]:Integer, [0,16,0,0,0,19]:Integer, [0,16,0,0,0,20]:Integer, [0,16,0,0,0,21]:Integer, [0,16,0,0,0,22]:Integer, [0,16,0,0,0,23]:Integer, [0,16,0,0,0,24]:Integer, [0,16,0,0,0,25]:Integer, [0,16,0,0,0,26]:Integer, [0,16,0,0,0,27]:Integer, [0,16,0,0,0,28]:Integer, [0,16,0,0,0,29]:Integer, [0,16,0,0,0,30]:Integer, [0,16,0,0,0,31]:Integer, [0,16,0,0,0,32]:Integer, [0,16,0,0,0,33]:Integer, [0,16,0,0,0,34]:Integer, [0,16,0,0,0,35]:Integer, [0,16,0,0,0,36]:Integer, [0,16,0,0,0,37]:Integer, [0,16,0,0,0,38]:Integer, [0,16,0,0,0,39]:Integer, [0,16,0,0,0,40]:Integer, [0,16,0,0,8]:Integer, [0,16,0,0,9]:Integer, [0,16,0,0,10]:Integer, [0,16,0,0,11]:Integer, [0,16,0,0,12]:Integer, [0,16,0,0,13]:Integer, [0,16,0,0,14]:Integer, [0,16,0,0,15]:Integer, [0,16,0,0,16]:Integer, [0,16,0,0,17]:Integer, [0,16,0,0,18]:Integer, [0,16,0,0,19]:Integer, [0,16,0,0,20]:Integer, [0,16,0,0,21]:Integer, [0,16,0,0,22]:Integer, [0,16,0,0,23]:Integer, [0,16,8]:Integer, [0,16,9]:Integer, [0,16,10]:Integer, [0,16,11]:Integer, [0,16,12]:Integer, [0,16,13]:Integer, [0,16,14]:Integer, [0,16,15]:Integer, [0,16,16]:Integer, [0,16,17]:Integer, [0,16,18]:Integer, [0,16,19]:Integer, [0,16,20]:Integer, [0,16,21]:Integer, [0,16,22]:Integer, [0,16,23]:Integer, [0,16,24]:Integer, [0,16,25]:Integer, [0,16,26]:Integer, [0,16,27]:Integer, [0,16,28]:Integer, [0,16,29]:Integer, [0,16,30]:Integer, [0,16,31]:Integer, [0,16,32]:Integer, [0,16,33]:Integer, [0,16,34]:Integer, [0,16,35]:Integer, [0,16,36]:Integer, [0,16,37]:Integer, [0,16,38]:Integer, [0,16,39]:Integer, [0,16,40]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [0,32]:Integer, [0,33]:Integer, [0,34]:Integer, [0,35]:Integer, [0,36]:Integer, [0,37]:Integer, [0,38]:Integer, [0,39]:Integer, [0,40]:Integer, [0,41]:Integer, [0,42]:Integer, [0,43]:Integer, [0,44]:Integer, [0,45]:Integer, [0,46]:Integer, [0,47]:Integer, [0,48]:Integer, [0,49]:Integer, [0,50]:Integer, [0,51]:Integer, [0,52]:Integer, [0,53]:Integer, [0,54]:Integer, [0,55]:Integer, [0,56]:Integer, [0,57]:Integer, [0,58]:Integer, [0,59]:Integer, [0,60]:Integer, [0,61]:Integer, [0,62]:Integer, [0,63]:Integer}
│
│ Stacktrace:
│ [1] Fix1
│ @ ./operators.jl:0
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:50 So then apart from these warnings the remaining question, as in #659 is, why we see the segfaults/different behaviour in the multithreaded case. The only difference is that in the multi-threaded case the variable structure is put into a wrapper to ensure that |
Great find! Unfortunately, will still need a minimal example to be able to resolve, if you can similarly try to simplify! |
I reduced it down to the following, which still needs to be reduced a lot more to be able to debug. @devmotion if you can assist, you'd be a lot faster than me since I have no idea waht any of these libraries/etc are xD using Distributions, DynamicPPL, LogDensityProblems, LogDensityProblemsAD, Enzyme, LinearAlgebra
using Turing
using Enzyme
using Turing.AbstractMCMC
using AdvancedHMC
@model function model()
m ~ Normal(0, 1)
s ~ InverseGamma()
x ~ Normal(m, s)
end
using Random
mod = model() | (; x=0.5)
alg = Turing.NUTS{Turing.EnzymeAD}()
spl = Sampler(alg, mod)
vi = DynamicPPL.default_varinfo(Random.GLOBAL_RNG, mod, spl)
vi = link!!(vi, spl, mod)
# Extract parameters.
theta = vi[spl]
# Create a Hamiltonian.
metricT = Turing.Inference.getmetricT(spl.alg)
metric = metricT(length(theta))
ℓ = LogDensityProblemsAD.ADgradient(
Turing.LogDensityFunction(vi, mod, spl, DynamicPPL.DefaultContext())
)
logπ = Base.Fix1(LogDensityProblems.logdensity, ℓ)
∂logπ∂θ(x) = LogDensityProblems.logdensity_and_gradient(ℓ, x)
hamiltonian = AdvancedHMC.Hamiltonian(metric, logπ, ∂logπ∂θ)
# Compute phase point z.
# r = rand(Random.GLOBAL_RNG, metricT, size(metric)...)
# r ./=
# r ./= metric.sqrtM⁻¹
# AdvancedHMC.rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic)
# AdvancedHMC.
# rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic)
AdvancedHMC.phasepoint(hamiltonian, theta, rand(Random.GLOBAL_RNG, metric, hamiltonian.kinetic))
# AbstractMCMC.step(Random.GLOBAL_RNG, mod, alg)
# mymcmcsample(Random.GLOBAL_RNG, mod, alg, 10)
# sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10) |
I'm happy to help but unfortunately it might take a few days before I find time for some more debugging. |
@devmotion any luck? |
@wsmoses here's a smaller example: using Enzyme
using Turing.LogDensityProblems
using Turing.Distributions
using Turing: DynamicPPL, NUTS
DynamicPPL.@model function model()
m ~ Normal(0, 1)
s ~ InverseGamma()
x ~ Normal(m, s)
end
mod = model()
sampler = DynamicPPL.Sampler(NUTS())
vi = DynamicPPL.VarInfo(mod)
vi = DynamicPPL.link!!(vi, sampler, mod)
ℓ = DynamicPPL.LogDensityFunction(mod, vi, DynamicPPL.DefaultContext())
x = vi[sampler] # Vector{Float64}
∂ℓ_∂x = zero(x)
LogDensityProblems.logdensity(ℓ, x) # works
Enzyme.autodiff(
Reverse,
LogDensityProblems.logdensity,
Const(ℓ),
Duplicated(x, ∂ℓ_∂x),
) On Enzyme v0.10, this segfaults for me regardless of whether ERROR: MethodError: no method matching callconv!(::Ptr{LLVM.API.LLVMOpaqueValue}, ::UInt32)
Closest candidates are:
callconv!(::Union{LLVM.CallBrInst, LLVM.CallInst, LLVM.InvokeInst}, ::Any)
@ LLVM ~/.julia/packages/LLVM/TLGyi/src/core/instructions.jl:155
callconv!(::LLVM.Function, ::Any)
@ LLVM ~/.julia/packages/LLVM/TLGyi/src/core/function.jl:27
Stacktrace:
[1] jl_array_ptr_copy_fwd(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:4873
[2] jl_array_ptr_copy_augfwd(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, tapeR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:4892
[3] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
@ Enzyme.API ~/.julia/packages/Enzyme/EncRR/src/api.jl:124
[4] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{Bool, Bool, Bool}, returnPrimal::Bool, jlrules::Vector{String}, expectedTapeType::Type)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:6680
[5] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.ThreadSafeContext, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:7921
[6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, ctx::Nothing, postopt::Bool)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8434
[7] _thunk
@ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8431 [inlined]
[8] cached_compilation
@ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8469 [inlined]
[9] #s286#175
@ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8527 [inlined]
[10] var"#s286#175"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ::Any, ::Any, ::Any, ::Any, tt::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
@ Enzyme.Compiler ./none:0
[11] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[12] thunk
@ ~/.julia/packages/Enzyme/EncRR/src/compiler.jl:8486 [inlined]
[13] autodiff
@ ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:199 [inlined]
[14] autodiff
@ ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:228 [inlined]
[15] autodiff(::EnzymeCore.ReverseMode{false}, ::typeof(LogDensityProblems.logdensity), ::Const{DynamicPPL.LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s, :x), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:x, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:x, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.DefaultContext}, DynamicPPL.DefaultContext}}, ::Duplicated{Vector{Float64}})
@ Enzyme ~/.julia/packages/Enzyme/EncRR/src/Enzyme.jl:214
[16] top-level scope
@ REPL[33]:1 Note this is now using the release version of Turing and not the branch that glues it and Enzyme (which is not up-to-date with Enzyme v0.11 compat) julia> using Pkg; Pkg.status()
Status `/tmp/jl_eLfrOK/Project.toml`
[7da242da] Enzyme v0.11.0
[fce5fe82] Turing v0.24.3
julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
Threads: 1 on 8 virtual cores
Environment:
JULIA_CMDSTAN_HOME = /home/sethaxen/software/cmdstan/2.30.1/
JULIA_EDITOR = code |
I've fixed the 0.11 error you saw on main just now. Your test now has the previous behavior of working fine single threaded, but segfaulting multi threaded. Unfortunately, this means additional minimization is required. |
No, I had to postpone it. Will return to it probably in ~ 2 weeks. |
Should be solved by #772 please reopen if it persists. |
For me the example in #650 (comment) still segfaults (even without Edit: I don't have permissions to reopen. |
@sethaxen Can you make a minimal reproducer out of that comment? |
It appeared to work on my system post fix, unfortunately. |
I'll try but I'm also not very familiar with DynamicPPL's internals. |
In any case if you/anyone find any segfault/GC issues and can minimize them (see my minimization inline above, for example) it will allow us to attempt to fix them. |
And FWIW, if you're able to minimize to help us try to fix it, it does appear that the performance improvement is significant -- at least for this code: julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10000)
┌ Info: Found initial step size
└ ϵ = 3.2
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:00
Chains MCMC chain (10000×14×1 Array{Float64, 3}):
Iterations = 1001:1:11000
Number of chains = 1
Samples per chain = 10000
Wall duration = 0.69 seconds
Compute duration = 0.69 seconds
parameters = m, s
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
m 0.2315 0.6896 0.0113 3973.8777 4139.8132 0.9999 5784.3926
s 1.5227 2.9910 0.0510 2829.1958 2915.2429 1.0009 4118.1890
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
m -1.2766 -0.1601 0.2890 0.6651 1.5145
s 0.2375 0.5263 0.8791 1.5640 6.2744
julia> sample(model() | (; x=0.5), NUTS{Turing.ZygoteAD}(), 10000)
┌ Info: Found initial step size
└ ϵ = 0.8
Sampling 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:29
Chains MCMC chain (10000×14×1 Array{Float64, 3}):
Iterations = 1001:1:11000
Number of chains = 1
Samples per chain = 10000
Wall duration = 29.35 seconds
Compute duration = 29.35 seconds
parameters = m, s
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
m 0.2465 0.6962 0.0111 4146.4375 4181.3396 1.0001 141.2804
s 1.4766 2.9093 0.0483 2859.4063 2677.9875 1.0004 97.4277
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
m -1.2932 -0.1415 0.3066 0.6702 1.5565
s 0.2348 0.5326 0.8819 1.5318 6.1036 |
I don't know if it's related, but I kept repeating the following line of @yebai's example in #650 (comment): sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10000) and it eventually failed for me with an error: GC error (probable corruption) :
Allocations: 134280068 (Pool: 134191737; Big: 88331); GC: 134
<?#0x7ff99834bc60::(nil)>
0x7ff9d0fff010: Queued root: 0x7ffa0525ff10 :: 0x7ff9f7a58fa0 (bits: 3)
of type REPL.LineEdit.PromptState
0x7ff9d0fff028: Queued root: 0x7ffa0292b8b0 :: 0x7ff9f7a537f0 (bits: 3)
of type REPL.REPLHistoryProvider
0x7ff9d0fff040: Queued root: 0x7ffa01cfded0 :: 0x7ff9f7a55840 (bits: 3)
of type REPL.LineEdit.MIState
0x7ff9d0fff058: Queued root: 0x7ff9fc6d19b0 :: 0x7ff9f7993470 (bits: 7)
of type Base.IdDict{Any, Any}
0x7ff9d0fff070: r-- Stack frame 0x7ffcccf7ac30 -- 196 of 462 (direct)
0x7ff9d0fff098: `- Object (16bit) 0x7ff9750eb660 :: 0x7ff984cce5d1 -- [18, 19)
of type Tuple{Float64, Float64, NamedTuple{(Symbol("1"), Symbol("2")), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3")), Tuple{Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Sym
bol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7"), Symbol("8")), NTuple{8, Any}}, NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol(
"7"), Symbol("8")), NTuple{8, Any}}}, Any, Any}}, Any}}}
[499273] signal (6.-6): Aborted
in expression starting at REPL[7]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gc_assert_datatype_fail at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1912
gc_mark_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3020
_jl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3400
ijl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3707
maybe_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1078 [inlined]
jl_gc_pool_alloc_inner at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1443 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:1504 [inlined]
jl_gc_alloc_ at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia_internal.h:460 [inlined]
jl_gc_alloc at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:3754
unknown function (ip: 0x7ff9a78233c8)
unknown function (ip: 0x7ff9a7827ff2)
unknown function (ip: 0x7ff9a781820d)
Allocations: 134280068 (Pool: 134191737; Big: 88331); GC: 134
Aborted (core dumped) I'm on Linux: (jl_ztgnWD) pkg> st
Status `/tmp/jl_ztgnWD/Project.toml`
[7da242da] Enzyme v0.11.2 `https://github.com/EnzymeAD/Enzyme.jl.git#main`
[fce5fe82] Turing v0.26.2 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`
julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
Threads: 8 on 8 virtual cores
Environment:
JULIA_CMDSTAN_HOME = /home/sethaxen/software/cmdstan/2.30.1/
JULIA_NUM_THREADS = auto
JULIA_EDITOR = code |
How much ram does your system have? I suspect that in your case GC runs more frequently thus exposing the issue. |
31G shell> free -h
total used free shared buff/cache available
Mem: 31Gi 17Gi 3.8Gi 6.4Gi 9.6Gi 5.0Gi
Swap: 19Gi 18Gi 1.1Gi |
I think it unlikely to have fixed this, but I just landed some minor GC fixes, in case it changes this. |
I ran the command in #650 (comment) a few dozen times without error, so I cautiously think it may be fixed for me now! |
So I unfortunately reproduced it (log above) m, but that means I can debug it hopefully! |
|
|
Okay after much depth of debugging, @gbaraldi and I found the actual source of the GC error (and subsequently fixed it here: EnzymeAD/Enzyme#1314 (review)). Once that lands on Enzyme proper, we'll cut a jll, then land that here. At that point try it again (will bump people), and let's see if it is all happy! |
Landed on main, try it @sethaxen @devmotion @yebai ? |
It seems more stable. I can repeat the
Enviroment: julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 8 × Apple M2
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, westmere)
Threads: 1 on 8 virtual cores
pkg> st
Status `~/projects/enzyme-turing/Project.toml`
[7da242da] Enzyme v0.11.3 `https://github.com/EnzymeAD/Enzyme.jl.git#main`
[f6369f11] ForwardDiff v0.10.35
[37e2e3b7] ReverseDiff v1.14.6
[fce5fe82] Turing v0.26.2 `https://github.com/TuringLang/Turing.jl.git#dw/enzyme`
[e88e6eb3] Zygote v0.6.62 |
Ps. The segfault is easier to reproduce on a machine with smaller RAM. |
|
|
|
|
; Function Attrs: mustprogress willreturn
define internal fastcc { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 } @augmented_julia__all_1912({ { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(144) %0, { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* nocapture align 8 %"'", {} addrspace(10)* noundef nonnull readonly align 16 dereferenceable(40) %1, {} addrspace(10)* align 16 %"'1") unnamed_addr #84 !dbg !5821 {
top:
%2 = alloca { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, align 8
%3 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, i32 0, i32 0
%4 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 0
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %4, align 8
%5 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 1
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %5, align 8
%6 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 2
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %6, align 8
%7 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 3
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %7, align 8
%8 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 4
%9 = bitcast {} addrspace(10)* addrspace(10)** %8 to {} addrspace(10)**
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %9, align 8
%10 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 5
%11 = bitcast {} addrspace(10)* addrspace(10)** %10 to {} addrspace(10)**
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %11, align 8
%12 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 6
%13 = bitcast {} addrspace(10)* addrspace(10)** %12 to {} addrspace(10)**
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %13, align 8
%14 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 7
%15 = bitcast {} addrspace(10)* addrspace(10)** %14 to {} addrspace(10)**
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %15, align 8
%16 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 9
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %16, align 8
%17 = getelementptr { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i64 0, i32 13
%18 = bitcast {} addrspace(10)* addrspace(10)** %17 to {} addrspace(10)**
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657769869320 to {}*) to {} addrspace(10)*), {} addrspace(10)** %18, align 8
%"iv'ac" = alloca i64, align 8
%loopLimit_cache = alloca i64, align 8
%_cache = alloca {} addrspace(10)* addrspace(10)*, align 8
%"'mi13_cache" = alloca {} addrspace(10)* addrspace(10)*, align 8
%_cache14 = alloca {} addrspace(10)* addrspace(10)*, align 8
%"'ipl16_cache" = alloca {} addrspace(10)* addrspace(10)*, align 8
%.not16_cache = alloca i1*, align 8
%.not17_cache = alloca i1*, align 8
%_cache18 = alloca {} addrspace(10)* addrspace(10)*, align 8
%19 = call {}*** @julia.get_pgcstack()
%20 = call {}*** @julia.get_pgcstack()
%21 = call {}*** @julia.get_pgcstack()
%22 = call {}*** @julia.get_pgcstack()
%23 = call {}*** @julia.get_pgcstack()
%24 = call {}*** @julia.get_pgcstack()
%25 = call {}*** @julia.get_pgcstack()
%26 = call {}*** @julia.get_pgcstack()
%27 = call {}*** @julia.get_pgcstack()
%28 = call {}*** @julia.get_pgcstack()
%29 = call {}*** @julia.get_pgcstack()
%30 = call {}*** @julia.get_pgcstack()
%31 = call {}*** @julia.get_pgcstack()
%32 = call {}*** @julia.get_pgcstack()
%33 = call {}*** @julia.get_pgcstack()
%34 = call {}*** @julia.get_pgcstack() #86
%35 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !5826
%36 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %35 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !5826
%37 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %36, i64 0, i32 1, !dbg !5826
%38 = load i64, i64 addrspace(11)* %37, align 8, !dbg !5826, !tbaa !162, !range !165, !alias.scope !5830, !noalias !5833
%.not = icmp eq i64 %38, 0, !dbg !5835
%39 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 8, !dbg !5827
store i1 %.not, i1* %39, align 1, !dbg !5827
br i1 %.not, label %common.ret, label %L9, !dbg !5827
L9: ; preds = %top
%"'ipc" = bitcast {} addrspace(10)* %"'1" to {} addrspace(10)* addrspace(13)* addrspace(10)*, !dbg !5837
%40 = bitcast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(13)* addrspace(10)*, !dbg !5837
%"'ipc8" = addrspacecast {} addrspace(10)* addrspace(13)* addrspace(10)* %"'ipc" to {} addrspace(10)* addrspace(13)* addrspace(11)*, !dbg !5837
%41 = addrspacecast {} addrspace(10)* addrspace(13)* addrspace(10)* %40 to {} addrspace(10)* addrspace(13)* addrspace(11)*, !dbg !5837
%"'ipl9" = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %"'ipc8", align 8, !dbg !5837, !tbaa !198, !alias.scope !5838, !noalias !5841, !nonnull !93
%42 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %41, align 8, !dbg !5837, !tbaa !198, !alias.scope !5842, !noalias !5833, !nonnull !93
%"'ipl" = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipl9", align 8, !dbg !5837, !tbaa !648, !alias.scope !5843, !noalias !5846
%43 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 0, !dbg !5837
store {} addrspace(10)* %"'ipl", {} addrspace(10)** %43, align 8, !dbg !5837
%44 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %42, align 8, !dbg !5837, !tbaa !648, !alias.scope !5848, !noalias !5849
%45 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 9, !dbg !5837
store {} addrspace(10)* %44, {} addrspace(10)** %45, align 8, !dbg !5837
%.not14 = icmp eq {} addrspace(10)* %44, null, !dbg !5837
br i1 %.not14, label %fail, label %L17, !dbg !5837
L17: ; preds = %L9
%current_task515 = getelementptr inbounds {}**, {}*** %34, i64 -13, !dbg !5850
%current_task5 = bitcast {}*** %current_task515 to {}**, !dbg !5850
%46 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
%47 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 3, !dbg !5850
store {} addrspace(10)* %46, {} addrspace(10)** %47, align 8, !dbg !5850
%"'mi" = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
%48 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 2, !dbg !5850
store {} addrspace(10)* %"'mi", {} addrspace(10)** %48, align 8, !dbg !5850
%49 = bitcast {} addrspace(10)* %"'mi" to i8 addrspace(10)*, !dbg !5850
call void @llvm.memset.p10i8.i64(i8 addrspace(10)* nonnull dereferenceable(144) dereferenceable_or_null(144) %49, i8 0, i64 144, i1 false), !dbg !5850
%"'ipc10" = bitcast {} addrspace(10)* %"'mi" to i8 addrspace(10)*, !dbg !5850
%50 = bitcast {} addrspace(10)* %46 to i8 addrspace(10)*, !dbg !5850
%"'ipc11" = bitcast { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %"'" to i8 addrspace(11)*, !dbg !5850
%51 = bitcast { { [2 x [8 x {} addrspace(10)*]], {} addrspace(10)*, {} addrspace(10)* } } addrspace(11)* %0 to i8 addrspace(11)*, !dbg !5850
call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %"'ipc10", i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %"'ipc11", i64 144, i1 false) #86, !dbg !5850
call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %50, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %51, i64 144, i1 false) #86, !dbg !5850, !tbaa !327, !alias.scope !651, !noalias !5851
%52 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @ijl_apply_generic, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657346395152 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657545686608 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954512 to {}*) to {} addrspace(10)*), {} addrspace(10)* %46, {} addrspace(10)* %"'mi", {} addrspace(10)* %44, {} addrspace(10)* %"'ipl"), !dbg !5850
%53 = addrspacecast {} addrspace(10)* %52 to {} addrspace(11)*, !dbg !5850
%54 = bitcast {} addrspace(11)* %53 to [3 x {} addrspace(10)*] addrspace(11)*, !dbg !5850
%55 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 1, !dbg !5850
%56 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %55, align 8, !dbg !5850
%57 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 0, !dbg !5850
%58 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %57, align 8, !dbg !5850
%59 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %54, i64 0, i64 2, !dbg !5850
%60 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %59, align 8, !dbg !5850
%61 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 1, !dbg !5854
store {} addrspace(10)* %60, {} addrspace(10)** %61, align 8, !dbg !5854
%62 = bitcast {} addrspace(10)* %58 to i8 addrspace(10)*, !dbg !5854
%63 = load i8, i8 addrspace(10)* %62, align 1, !dbg !5854, !tbaa !399, !range !654, !alias.scope !5855, !noalias !5858
%.not1622 = icmp eq i8 %63, 0, !dbg !5854
%64 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 10, !dbg !5854
store i1 %.not1622, i1* %64, align 1, !dbg !5854
br i1 %.not1622, label %common.ret, label %L23.preheader, !dbg !5854
L23.preheader: ; preds = %L17
store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5860
store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5860
store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5860
store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5860
store i1* null, i1** %.not16_cache, align 8, !dbg !5860
store i1* null, i1** %.not17_cache, align 8, !dbg !5860
store {} addrspace(10)* addrspace(10)* addrspacecast ({} addrspace(10)** inttoptr (i64 140657769869320 to {} addrspace(10)**) to {} addrspace(10)* addrspace(10)*), {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5860
br label %L23, !dbg !5860
L19: ; preds = %L32
%65 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
%"'mi13" = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task5, i64 144, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657259329040 to {}*) to {} addrspace(10)*)) #87, !dbg !5850
%66 = bitcast {} addrspace(10)* %"'mi13" to i8 addrspace(10)*, !dbg !5850
call void @llvm.memset.p10i8.i64(i8 addrspace(10)* nonnull dereferenceable(144) dereferenceable_or_null(144) %66, i8 0, i64 144, i1 false), !dbg !5850
%"'ipc12" = bitcast {} addrspace(10)* %"'mi13" to i8 addrspace(10)*, !dbg !5850
%67 = bitcast {} addrspace(10)* %65 to i8 addrspace(10)*, !dbg !5850
call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %"'ipc12", i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %"'ipc11", i64 144, i1 false) #86, !dbg !5850
call void @llvm.memcpy.p10i8.p11i8.i64(i8 addrspace(10)* noundef nonnull align 8 dereferenceable(144) %67, i8 addrspace(11)* noundef nonnull align 8 dereferenceable(144) %51, i64 144, i1 false) #86, !dbg !5850, !tbaa !327, !alias.scope !651, !noalias !5851
%68 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @ijl_apply_generic, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657346395152 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657545686608 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657073954512 to {}*) to {} addrspace(10)*), {} addrspace(10)* %65, {} addrspace(10)* %"'mi13", {} addrspace(10)* %308, {} addrspace(10)* %"'ipl16"), !dbg !5850
%69 = addrspacecast {} addrspace(10)* %68 to {} addrspace(11)*, !dbg !5850
%70 = bitcast {} addrspace(11)* %69 to [3 x {} addrspace(10)*] addrspace(11)*, !dbg !5850
%71 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 1, !dbg !5850
%72 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %71, align 8, !dbg !5850
%73 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 0, !dbg !5850
%74 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %73, align 8, !dbg !5850
%75 = getelementptr inbounds [3 x {} addrspace(10)*], [3 x {} addrspace(10)*] addrspace(11)* %70, i64 0, i64 2, !dbg !5850
%76 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %75, align 8, !dbg !5850
%77 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5862
%78 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %77, i64 %iv, !dbg !5850
store {} addrspace(10)* %76, {} addrspace(10)* addrspace(10)* %78, align 8, !dbg !5850, !invariant.group !5863
%79 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5864
%80 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %79, i64 %iv, !dbg !5850
store {} addrspace(10)* %65, {} addrspace(10)* addrspace(10)* %80, align 8, !dbg !5850, !invariant.group !5865
%81 = bitcast {} addrspace(10)* addrspace(10)* %79 to {} addrspace(10)*, !dbg !5850
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %81, {} addrspace(10)* %65), !dbg !5850
%82 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5850, !dereferenceable !140, !invariant.group !5866
%83 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %82, i64 %iv, !dbg !5850
store {} addrspace(10)* %"'mi13", {} addrspace(10)* addrspace(10)* %83, align 8, !dbg !5850, !invariant.group !5867
%84 = bitcast {} addrspace(10)* addrspace(10)* %82 to {} addrspace(10)*, !dbg !5850
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %84, {} addrspace(10)* %"'mi13"), !dbg !5850
%85 = bitcast {} addrspace(10)* addrspace(10)* %77 to {} addrspace(10)*, !dbg !5850
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %85, {} addrspace(10)* %76), !dbg !5850
%86 = bitcast {} addrspace(10)* %74 to i8 addrspace(10)*, !dbg !5854
%87 = load i8, i8 addrspace(10)* %86, align 1, !dbg !5854, !tbaa !399, !range !654, !alias.scope !5868, !noalias !5871
%.not16 = icmp eq i8 %87, 0, !dbg !5854
%88 = load i1*, i1** %.not16_cache, align 8, !dbg !5854, !dereferenceable !140, !invariant.group !5873
%89 = getelementptr inbounds i1, i1* %88, i64 %iv, !dbg !5854
store i1 %.not16, i1* %89, align 1, !dbg !5854, !invariant.group !5874
br i1 %.not16, label %common.ret.loopexit, label %L23, !dbg !5854
L23: ; preds = %L19, %L23.preheader
%iv = phi i64 [ 0, %L23.preheader ], [ %iv.next, %L19 ]
%iv.next = add nuw nsw i64 %iv, 1, !dbg !5875
%90 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5875
%91 = bitcast {} addrspace(10)* addrspace(10)* %90 to {} addrspace(10)*, !dbg !5875
%92 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
%93 = and i64 %iv.next, 1, !dbg !5875
%94 = icmp ne i64 %93, 0, !dbg !5875
%95 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%96 = icmp ult i64 %95, 3, !dbg !5875
%97 = and i1 %96, %94, !dbg !5875
br i1 %97, label %grow.i, label %"[email protected]", !dbg !5875
grow.i: ; preds = %L23
%98 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%99 = sub nuw nsw i64 64, %98, !dbg !5875
%100 = shl i64 8, %99, !dbg !5875
%101 = lshr i64 %100, 1, !dbg !5875
%102 = icmp eq i64 %iv.next, 1, !dbg !5875
%103 = select i1 %102, i64 0, i64 %101, !dbg !5875
%104 = udiv exact i64 %100, 8, !dbg !5875
%105 = call {} addrspace(10)* @ijl_box_int64(i64 %104) #88, !dbg !5875
%106 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %105, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
%107 = bitcast {}*** %92 to {}**, !dbg !5875
%108 = getelementptr inbounds {}*, {}** %107, i64 -13, !dbg !5875
%109 = getelementptr inbounds {}*, {}** %108, i64 15, !dbg !5875
%110 = bitcast {}** %109 to i8**, !dbg !5875
%111 = load i8*, i8** %110, align 8, !dbg !5875
%112 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %111, i64 %100, {} addrspace(10)* %106) #88, !dbg !5875
%113 = sub i64 %100, %103, !dbg !5875
%114 = bitcast {} addrspace(10)* %112 to i8 addrspace(10)*, !dbg !5875
%115 = getelementptr inbounds i8, i8 addrspace(10)* %114, i64 %103, !dbg !5875
%116 = bitcast i8 addrspace(10)* %115 to {} addrspace(10)*, !dbg !5875
call void @zeroType.38({} addrspace(10)* %116, i8 0, i64 %113) #88, !dbg !5875
%117 = bitcast {} addrspace(10)* %112 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%118 = bitcast {} addrspace(10)* addrspace(10)* %117 to i8 addrspace(10)*, !dbg !5875
%119 = bitcast {} addrspace(10)* %91 to i8 addrspace(10)*, !dbg !5875
call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %118, i8 addrspace(10)* %119, i64 %103, i1 false) #88, !dbg !5875
%120 = bitcast i8 addrspace(10)* %118 to {} addrspace(10)*, !dbg !5875
br label %"[email protected]", !dbg !5875
"[email protected]": ; preds = %L23, %grow.i
%121 = phi {} addrspace(10)* [ %120, %grow.i ], [ %91, %L23 ], !dbg !5875
%122 = bitcast {} addrspace(10)* %121 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%123 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 13, !dbg !5875
store {} addrspace(10)* addrspace(10)* %122, {} addrspace(10)* addrspace(10)** %123, align 8, !dbg !5875
store {} addrspace(10)* addrspace(10)* %122, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5875
%124 = load i1*, i1** %.not17_cache, align 8, !dbg !5875
%125 = bitcast i1* %124 to i8*, !dbg !5875
%126 = and i64 %iv.next, 1, !dbg !5875
%127 = icmp ne i64 %126, 0, !dbg !5875
%128 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%129 = icmp ult i64 %128, 3, !dbg !5875
%130 = and i1 %129, %127, !dbg !5875
br i1 %130, label %grow.i1, label %__enzyme_exponentialallocationzero.exit, !dbg !5875
grow.i1: ; preds = %"[email protected]"
%131 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%132 = sub nuw nsw i64 64, %131, !dbg !5875
%133 = shl i64 1, %132, !dbg !5875
%134 = lshr i64 %133, 1, !dbg !5875
%135 = icmp eq i64 %iv.next, 1, !dbg !5875
%136 = select i1 %135, i64 0, i64 %134, !dbg !5875
%137 = call i8* @realloc(i8* %125, i64 %133) #88, !dbg !5875
%138 = sub i64 %133, %136, !dbg !5875
%139 = getelementptr inbounds i8, i8* %137, i64 %136, !dbg !5875
call void @llvm.memset.p0i8.i64(i8* %139, i8 0, i64 %138, i1 false) #88, !dbg !5875
br label %__enzyme_exponentialallocationzero.exit, !dbg !5875
__enzyme_exponentialallocationzero.exit: ; preds = %"[email protected]", %grow.i1
%140 = phi i8* [ %137, %grow.i1 ], [ %125, %"[email protected]" ], !dbg !5875
%141 = bitcast i8* %140 to i1*, !dbg !5875
%142 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 12, !dbg !5875
store i1* %141, i1** %142, align 8, !dbg !5875
store i1* %141, i1** %.not17_cache, align 1, !dbg !5875
%143 = load i1*, i1** %.not16_cache, align 8, !dbg !5875
%144 = bitcast i1* %143 to i8*, !dbg !5875
%145 = and i64 %iv.next, 1, !dbg !5875
%146 = icmp ne i64 %145, 0, !dbg !5875
%147 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%148 = icmp ult i64 %147, 3, !dbg !5875
%149 = and i1 %148, %146, !dbg !5875
br i1 %149, label %grow.i2, label %__enzyme_exponentialallocationzero.exit3, !dbg !5875
grow.i2: ; preds = %__enzyme_exponentialallocationzero.exit
%150 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%151 = sub nuw nsw i64 64, %150, !dbg !5875
%152 = shl i64 1, %151, !dbg !5875
%153 = lshr i64 %152, 1, !dbg !5875
%154 = icmp eq i64 %iv.next, 1, !dbg !5875
%155 = select i1 %154, i64 0, i64 %153, !dbg !5875
%156 = call i8* @realloc(i8* %144, i64 %152) #88, !dbg !5875
%157 = sub i64 %152, %155, !dbg !5875
%158 = getelementptr inbounds i8, i8* %156, i64 %155, !dbg !5875
call void @llvm.memset.p0i8.i64(i8* %158, i8 0, i64 %157, i1 false) #88, !dbg !5875
br label %__enzyme_exponentialallocationzero.exit3, !dbg !5875
__enzyme_exponentialallocationzero.exit3: ; preds = %__enzyme_exponentialallocationzero.exit, %grow.i2
%159 = phi i8* [ %156, %grow.i2 ], [ %144, %__enzyme_exponentialallocationzero.exit ], !dbg !5875
%160 = bitcast i8* %159 to i1*, !dbg !5875
%161 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 11, !dbg !5875
store i1* %160, i1** %161, align 8, !dbg !5875
store i1* %160, i1** %.not16_cache, align 1, !dbg !5875
%162 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5875
%163 = bitcast {} addrspace(10)* addrspace(10)* %162 to {} addrspace(10)*, !dbg !5875
%164 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
%165 = and i64 %iv.next, 1, !dbg !5875
%166 = icmp ne i64 %165, 0, !dbg !5875
%167 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%168 = icmp ult i64 %167, 3, !dbg !5875
%169 = and i1 %168, %166, !dbg !5875
br i1 %169, label %grow.i4, label %"[email protected]", !dbg !5875
grow.i4: ; preds = %__enzyme_exponentialallocationzero.exit3
%170 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%171 = sub nuw nsw i64 64, %170, !dbg !5875
%172 = shl i64 8, %171, !dbg !5875
%173 = lshr i64 %172, 1, !dbg !5875
%174 = icmp eq i64 %iv.next, 1, !dbg !5875
%175 = select i1 %174, i64 0, i64 %173, !dbg !5875
%176 = udiv exact i64 %172, 8, !dbg !5875
%177 = call {} addrspace(10)* @ijl_box_int64(i64 %176) #88, !dbg !5875
%178 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %177, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
%179 = bitcast {}*** %164 to {}**, !dbg !5875
%180 = getelementptr inbounds {}*, {}** %179, i64 -13, !dbg !5875
%181 = getelementptr inbounds {}*, {}** %180, i64 15, !dbg !5875
%182 = bitcast {}** %181 to i8**, !dbg !5875
%183 = load i8*, i8** %182, align 8, !dbg !5875
%184 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %183, i64 %172, {} addrspace(10)* %178) #88, !dbg !5875
%185 = sub i64 %172, %175, !dbg !5875
%186 = bitcast {} addrspace(10)* %184 to i8 addrspace(10)*, !dbg !5875
%187 = getelementptr inbounds i8, i8 addrspace(10)* %186, i64 %175, !dbg !5875
%188 = bitcast i8 addrspace(10)* %187 to {} addrspace(10)*, !dbg !5875
call void @zeroType.38({} addrspace(10)* %188, i8 0, i64 %185) #88, !dbg !5875
%189 = bitcast {} addrspace(10)* %184 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%190 = bitcast {} addrspace(10)* addrspace(10)* %189 to i8 addrspace(10)*, !dbg !5875
%191 = bitcast {} addrspace(10)* %163 to i8 addrspace(10)*, !dbg !5875
call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %190, i8 addrspace(10)* %191, i64 %175, i1 false) #88, !dbg !5875
%192 = bitcast i8 addrspace(10)* %190 to {} addrspace(10)*, !dbg !5875
br label %"[email protected]", !dbg !5875
"[email protected]": ; preds = %__enzyme_exponentialallocationzero.exit3, %grow.i4
%193 = phi {} addrspace(10)* [ %192, %grow.i4 ], [ %163, %__enzyme_exponentialallocationzero.exit3 ], !dbg !5875
%194 = bitcast {} addrspace(10)* %193 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%195 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 7, !dbg !5875
store {} addrspace(10)* addrspace(10)* %194, {} addrspace(10)* addrspace(10)** %195, align 8, !dbg !5875
store {} addrspace(10)* addrspace(10)* %194, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5875
%196 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5875
%197 = bitcast {} addrspace(10)* addrspace(10)* %196 to {} addrspace(10)*, !dbg !5875
%198 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
%199 = and i64 %iv.next, 1, !dbg !5875
%200 = icmp ne i64 %199, 0, !dbg !5875
%201 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%202 = icmp ult i64 %201, 3, !dbg !5875
%203 = and i1 %202, %200, !dbg !5875
br i1 %203, label %grow.i6, label %"[email protected]", !dbg !5875
grow.i6: ; preds = %"[email protected]"
%204 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%205 = sub nuw nsw i64 64, %204, !dbg !5875
%206 = shl i64 8, %205, !dbg !5875
%207 = lshr i64 %206, 1, !dbg !5875
%208 = icmp eq i64 %iv.next, 1, !dbg !5875
%209 = select i1 %208, i64 0, i64 %207, !dbg !5875
%210 = udiv exact i64 %206, 8, !dbg !5875
%211 = call {} addrspace(10)* @ijl_box_int64(i64 %210) #88, !dbg !5875
%212 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %211, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
%213 = bitcast {}*** %198 to {}**, !dbg !5875
%214 = getelementptr inbounds {}*, {}** %213, i64 -13, !dbg !5875
%215 = getelementptr inbounds {}*, {}** %214, i64 15, !dbg !5875
%216 = bitcast {}** %215 to i8**, !dbg !5875
%217 = load i8*, i8** %216, align 8, !dbg !5875
%218 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %217, i64 %206, {} addrspace(10)* %212) #88, !dbg !5875
%219 = sub i64 %206, %209, !dbg !5875
%220 = bitcast {} addrspace(10)* %218 to i8 addrspace(10)*, !dbg !5875
%221 = getelementptr inbounds i8, i8 addrspace(10)* %220, i64 %209, !dbg !5875
%222 = bitcast i8 addrspace(10)* %221 to {} addrspace(10)*, !dbg !5875
call void @zeroType.38({} addrspace(10)* %222, i8 0, i64 %219) #88, !dbg !5875
%223 = bitcast {} addrspace(10)* %218 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%224 = bitcast {} addrspace(10)* addrspace(10)* %223 to i8 addrspace(10)*, !dbg !5875
%225 = bitcast {} addrspace(10)* %197 to i8 addrspace(10)*, !dbg !5875
call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %224, i8 addrspace(10)* %225, i64 %209, i1 false) #88, !dbg !5875
%226 = bitcast i8 addrspace(10)* %224 to {} addrspace(10)*, !dbg !5875
br label %"[email protected]", !dbg !5875
"[email protected]": ; preds = %"[email protected]", %grow.i6
%227 = phi {} addrspace(10)* [ %226, %grow.i6 ], [ %197, %"[email protected]" ], !dbg !5875
%228 = bitcast {} addrspace(10)* %227 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%229 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 6, !dbg !5875
store {} addrspace(10)* addrspace(10)* %228, {} addrspace(10)* addrspace(10)** %229, align 8, !dbg !5875
store {} addrspace(10)* addrspace(10)* %228, {} addrspace(10)* addrspace(10)** %_cache14, align 8, !dbg !5875
%230 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5875
%231 = bitcast {} addrspace(10)* addrspace(10)* %230 to {} addrspace(10)*, !dbg !5875
%232 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
%233 = and i64 %iv.next, 1, !dbg !5875
%234 = icmp ne i64 %233, 0, !dbg !5875
%235 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%236 = icmp ult i64 %235, 3, !dbg !5875
%237 = and i1 %236, %234, !dbg !5875
br i1 %237, label %grow.i8, label %"[email protected]", !dbg !5875
grow.i8: ; preds = %"[email protected]"
%238 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%239 = sub nuw nsw i64 64, %238, !dbg !5875
%240 = shl i64 8, %239, !dbg !5875
%241 = lshr i64 %240, 1, !dbg !5875
%242 = icmp eq i64 %iv.next, 1, !dbg !5875
%243 = select i1 %242, i64 0, i64 %241, !dbg !5875
%244 = udiv exact i64 %240, 8, !dbg !5875
%245 = call {} addrspace(10)* @ijl_box_int64(i64 %244) #88, !dbg !5875
%246 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %245, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
%247 = bitcast {}*** %232 to {}**, !dbg !5875
%248 = getelementptr inbounds {}*, {}** %247, i64 -13, !dbg !5875
%249 = getelementptr inbounds {}*, {}** %248, i64 15, !dbg !5875
%250 = bitcast {}** %249 to i8**, !dbg !5875
%251 = load i8*, i8** %250, align 8, !dbg !5875
%252 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %251, i64 %240, {} addrspace(10)* %246) #88, !dbg !5875
%253 = sub i64 %240, %243, !dbg !5875
%254 = bitcast {} addrspace(10)* %252 to i8 addrspace(10)*, !dbg !5875
%255 = getelementptr inbounds i8, i8 addrspace(10)* %254, i64 %243, !dbg !5875
%256 = bitcast i8 addrspace(10)* %255 to {} addrspace(10)*, !dbg !5875
call void @zeroType.38({} addrspace(10)* %256, i8 0, i64 %253) #88, !dbg !5875
%257 = bitcast {} addrspace(10)* %252 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%258 = bitcast {} addrspace(10)* addrspace(10)* %257 to i8 addrspace(10)*, !dbg !5875
%259 = bitcast {} addrspace(10)* %231 to i8 addrspace(10)*, !dbg !5875
call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %258, i8 addrspace(10)* %259, i64 %243, i1 false) #88, !dbg !5875
%260 = bitcast i8 addrspace(10)* %258 to {} addrspace(10)*, !dbg !5875
br label %"[email protected]", !dbg !5875
"[email protected]": ; preds = %"[email protected]", %grow.i8
%261 = phi {} addrspace(10)* [ %260, %grow.i8 ], [ %231, %"[email protected]" ], !dbg !5875
%262 = bitcast {} addrspace(10)* %261 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%263 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 5, !dbg !5875
store {} addrspace(10)* addrspace(10)* %262, {} addrspace(10)* addrspace(10)** %263, align 8, !dbg !5875
store {} addrspace(10)* addrspace(10)* %262, {} addrspace(10)* addrspace(10)** %"'mi13_cache", align 8, !dbg !5875
%264 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5875
%265 = bitcast {} addrspace(10)* addrspace(10)* %264 to {} addrspace(10)*, !dbg !5875
%266 = call {}*** @julia.get_pgcstack() #88, !dbg !5875
%267 = and i64 %iv.next, 1, !dbg !5875
%268 = icmp ne i64 %267, 0, !dbg !5875
%269 = call i64 @llvm.ctpop.i64(i64 %iv.next) #88, !dbg !5875
%270 = icmp ult i64 %269, 3, !dbg !5875
%271 = and i1 %270, %268, !dbg !5875
br i1 %271, label %grow.i10, label %"[email protected]", !dbg !5875
grow.i10: ; preds = %"[email protected]"
%272 = call i64 @llvm.ctlz.i64(i64 %iv.next, i1 true) #88, !dbg !5875
%273 = sub nuw nsw i64 64, %272, !dbg !5875
%274 = shl i64 8, %273, !dbg !5875
%275 = lshr i64 %274, 1, !dbg !5875
%276 = icmp eq i64 %iv.next, 1, !dbg !5875
%277 = select i1 %276, i64 0, i64 %275, !dbg !5875
%278 = udiv exact i64 %274, 8, !dbg !5875
%279 = call {} addrspace(10)* @ijl_box_int64(i64 %278) #88, !dbg !5875
%280 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @jl_f_apply_type, {} addrspace(10)* null, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657551139328 to {}*) to {} addrspace(10)*), {} addrspace(10)* %279, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140657106768208 to {}*) to {} addrspace(10)*)) #88, !dbg !5875
%281 = bitcast {}*** %266 to {}**, !dbg !5875
%282 = getelementptr inbounds {}*, {}** %281, i64 -13, !dbg !5875
%283 = getelementptr inbounds {}*, {}** %282, i64 15, !dbg !5875
%284 = bitcast {}** %283 to i8**, !dbg !5875
%285 = load i8*, i8** %284, align 8, !dbg !5875
%286 = call noalias nonnull {} addrspace(10)* @jl_gc_alloc_typed(i8* %285, i64 %274, {} addrspace(10)* %280) #88, !dbg !5875
%287 = sub i64 %274, %277, !dbg !5875
%288 = bitcast {} addrspace(10)* %286 to i8 addrspace(10)*, !dbg !5875
%289 = getelementptr inbounds i8, i8 addrspace(10)* %288, i64 %277, !dbg !5875
%290 = bitcast i8 addrspace(10)* %289 to {} addrspace(10)*, !dbg !5875
call void @zeroType.38({} addrspace(10)* %290, i8 0, i64 %287) #88, !dbg !5875
%291 = bitcast {} addrspace(10)* %286 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%292 = bitcast {} addrspace(10)* addrspace(10)* %291 to i8 addrspace(10)*, !dbg !5875
%293 = bitcast {} addrspace(10)* %265 to i8 addrspace(10)*, !dbg !5875
call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* %292, i8 addrspace(10)* %293, i64 %277, i1 false) #88, !dbg !5875
%294 = bitcast i8 addrspace(10)* %292 to {} addrspace(10)*, !dbg !5875
br label %"[email protected]", !dbg !5875
"[email protected]": ; preds = %"[email protected]", %grow.i10
%295 = phi {} addrspace(10)* [ %294, %grow.i10 ], [ %265, %"[email protected]" ], !dbg !5875
%296 = bitcast {} addrspace(10)* %295 to {} addrspace(10)* addrspace(10)*, !dbg !5875
%297 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }* %3, i32 0, i32 4, !dbg !5875
store {} addrspace(10)* addrspace(10)* %296, {} addrspace(10)* addrspace(10)** %297, align 8, !dbg !5875
store {} addrspace(10)* addrspace(10)* %296, {} addrspace(10)* addrspace(10)** %_cache, align 8, !dbg !5875
%298 = add i64 %iv, 2, !dbg !5875
%299 = add nsw i64 %298, -1, !dbg !5875
%300 = load i64, i64 addrspace(11)* %37, align 8, !dbg !5877, !tbaa !162, !range !165, !alias.scope !5830, !noalias !5833
%.not17 = icmp ult i64 %299, %300, !dbg !5878
%301 = load i1*, i1** %.not17_cache, align 8, !dbg !5860, !dereferenceable !140, !invariant.group !5880
%302 = getelementptr inbounds i1, i1* %301, i64 %iv, !dbg !5860
store i1 %.not17, i1* %302, align 1, !dbg !5860, !invariant.group !5881
br i1 %.not17, label %L32, label %common.ret.loopexit, !dbg !5860
L32: ; preds = %"[email protected]"
%"'ipl17" = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %"'ipc8", align 8, !dbg !5882, !tbaa !198, !alias.scope !5838, !noalias !5841, !nonnull !93
%303 = load {} addrspace(10)* addrspace(13)*, {} addrspace(10)* addrspace(13)* addrspace(11)* %41, align 8, !dbg !5882, !tbaa !198, !alias.scope !5842, !noalias !5833, !nonnull !93
%"'ipg" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipl17", i64 %299, !dbg !5882
%304 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %303, i64 %299, !dbg !5882
%"'ipl16" = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %"'ipg", align 8, !dbg !5882, !tbaa !648, !alias.scope !5883, !noalias !5886
%305 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %"'ipl16_cache", align 8, !dbg !5882, !dereferenceable !140, !invariant.group !5888
%306 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %305, i64 %iv, !dbg !5882
store {} addrspace(10)* %"'ipl16", {} addrspace(10)* addrspace(10)* %306, align 8, !dbg !5882, !tbaa !648, !invariant.group !5889
%307 = bitcast {} addrspace(10)* addrspace(10)* %305 to {} addrspace(10)*, !dbg !5882
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %307, {} addrspace(10)* %"'ipl16"), !dbg !5882
%308 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %304, align 8, !dbg !5882, !tbaa !648, !alias.scope !5890, !noalias !5891
%309 = load {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)** %_cache18, align 8, !dbg !5882, !dereferenceable !140, !invariant.group !5892
%310 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %309, i64 %iv, !dbg !5882
store {} addrspace(10)* %308, {} addrspace(10)* addrspace(10)* %310, align 8, !dbg !5882, !tbaa !648, !invariant.group !5893
%311 = bitcast {} addrspace(10)* addrspace(10)* %309 to {} addrspace(10)*, !dbg !5882
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %311, {} addrspace(10)* %308), !dbg !5882
%.not18 = icmp eq {} addrspace(10)* %308, null, !dbg !5882
br i1 %.not18, label %fail6, label %L19, !dbg !5882
common.ret.loopexit: ; preds = %"[email protected]", %L19
%312 = phi i64 [ %iv, %"[email protected]" ], [ %iv, %L19 ]
%common.ret.op.ph = phi i8 [ 0, %L19 ], [ 1, %"[email protected]" ]
store i64 %312, i64* %loopLimit_cache, align 8, !dbg !5894, !invariant.group !5895
br label %common.ret, !dbg !5894
common.ret: ; preds = %common.ret.loopexit, %L17, %top
%common.ret.op = phi i8 [ 1, %top ], [ 0, %L17 ], [ %common.ret.op.ph, %common.ret.loopexit ]
%313 = insertvalue { {} addrspace(10)*, i8 } undef, i8 %common.ret.op, 1, !dbg !5894
%314 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, i32 0, i32 1, !dbg !5894
store i8 %common.ret.op, i8* %314, align 1, !dbg !5894
%315 = load { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }, { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 }* %2, align 8, !dbg !5894
ret { { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1, {} addrspace(10)*, i1, i1*, i1*, {} addrspace(10)* addrspace(10)* }, i8 } %315, !dbg !5894
fail: ; preds = %L9
call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #86, !dbg !5837
unreachable, !dbg !5837
fail6: ; preds = %L32
call void @ijl_throw({} addrspace(12)* addrspacecast ({}* inttoptr (i64 140657625688192 to {}*) to {} addrspace(12)*)) #86, !dbg !5882
unreachable, !dbg !5882
} |
|
Problem appears to be that this is unset:
|
@sethaxen @devmotion @yebai found and pushed a fix for the subsequent segfault you found, retry? |
I just ran @yebai's example above with warnings disabled but unexpectedly ended up with a julia> Enzyme.API.runtimeActivity!(true)
julia> Enzyme.API.typeWarning!(false)
julia> sample(model() | (; x=0.5), NUTS{Turing.EnzymeAD}(), 10) # this works!
warning: didn't implement memmove, using memcpy as fallback which can result in errors
warning: didn't implement memmove, using memcpy as fallback which can result in errors
Sampling 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:05
ERROR: UndefVarError: `b` not defined
Stacktrace:
[1] jl_array_ptr_copy_fwd(B::LLVM.IRBuilder, orig::LLVM.CallInst, gutils::Enzyme.Compiler.GradientUtils, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:5210
[2] jl_array_ptr_copy_augfwd
@ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:5230 [inlined]
[3] (::Enzyme.Compiler.var"#304#305")(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, normalR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, shadowR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}}, tapeR::Ptr{Ptr{LLVM.API.LLVMOpaqueValue}})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:6426
[4] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
@ Enzyme.API ~/.julia/packages/Enzyme/ph9NM/src/api.jl:128
[5] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{Bool, Bool, Bool}, returnPrimal::Bool, jlrules::Vector{String}, expectedTapeType::Type)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:7418
[6] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, ctx::LLVM.ThreadSafeContext, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:8922
[7] codegen
@ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:8530 [inlined]
[8] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, ctx::Nothing, postopt::Bool)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9456
[9] _thunk
@ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9453 [inlined]
[10] cached_compilation
@ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9491 [inlined]
[11] #s291#430
@ ~/.julia/packages/Enzyme/ph9NM/src/compiler.jl:9553 [inlined]
[12] var"#s291#430"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ::Any, #unused#::Type, #unused#::Type, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
@ Enzyme.Compiler ./none:0
[13] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[14] autodiff
@ ~/.julia/packages/Enzyme/ph9NM/src/Enzyme.jl:207 [inlined]
[15] autodiff
@ ~/.julia/packages/Enzyme/ph9NM/src/Enzyme.jl:222 [inlined]
[16] logdensity_and_gradient(∇ℓ::LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}, x::Vector{Float64})
@ LogDensityProblemsADEnzymeExt ~/.julia/packages/LogDensityProblemsAD/JoNjv/ext/LogDensityProblemsADEnzymeExt.jl:73
[17] ∂logπ∂θ
@ ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:172 [inlined]
[18] ∂H∂θ(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}, Turing.Inference.var"#∂logπ∂θ#44"{LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}}, θ::Vector{Float64})
@ AdvancedHMC ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:38
[19] phasepoint(h::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64, Vector{Float64}}, AdvancedHMC.GaussianKinetic, Base.Fix1{typeof(LogDensityProblems.logdensity), LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}, Turing.Inference.var"#∂logπ∂θ#44"{LogDensityProblemsADEnzymeExt.EnzymeGradientLogDensity{LogDensityFunction{DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, Random.TaskLocalRNG}}, ReverseMode{false, FFIABI}, Nothing}}}, θ::Vector{Float64}, r::Vector{Float64})
@ AdvancedHMC ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:80
[20] phasepoint
@ ~/.julia/packages/AdvancedHMC/2MdYL/src/hamiltonian.jl:159 [inlined]
[21] initialstep(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, spl::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, vi::DynamicPPL.TypedVarInfo{NamedTuple{(:m, :s), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:m, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:m, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:s, Setfield.IdentityLens}, Int64}, Vector{InverseGamma{Float64}}, Vector{AbstractPPL.VarName{:s, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}; init_params::Nothing, nadapts::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:176
[22] step(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, spl::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}; resume_from::Nothing, init_params::Nothing, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:nadapts,), Tuple{Int64}}})
@ DynamicPPL ~/.julia/packages/DynamicPPL/oJMmE/src/sampler.jl:111
[23] step
@ ~/.julia/packages/DynamicPPL/oJMmE/src/sampler.jl:84 [inlined]
[24] macro expansion
@ ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:125 [inlined]
[25] macro expansion
@ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:328 [inlined]
[26] macro expansion
@ ~/.julia/packages/AbstractMCMC/fWWW0/src/logging.jl:9 [inlined]
[27] mcmcsample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, sampler::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, N::Int64; progress::Bool, progressname::String, callback::Nothing, discard_initial::Int64, thinning::Int64, chain_type::Type, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:nadapts,), Tuple{Int64}}})
@ AbstractMCMC ~/.julia/packages/AbstractMCMC/fWWW0/src/sample.jl:116
[28] sample(rng::Random.TaskLocalRNG, model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, sampler::DynamicPPL.Sampler{NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}}, N::Int64; chain_type::Type, resume_from::Nothing, progress::Bool, nadapts::Int64, discard_adapt::Bool, discard_initial::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:133
[29] sample
@ ~/.julia/packages/Turing/PbWOa/src/inference/hmc.jl:103 [inlined]
[30] #sample#2
@ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:146 [inlined]
[31] sample
@ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:139 [inlined]
[32] #sample#1
@ ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:136 [inlined]
[33] sample(model::DynamicPPL.Model{typeof(model), (), (), (), Tuple{}, Tuple{}, DynamicPPL.ConditionContext{NamedTuple{(:x,), Tuple{Float64}}, DynamicPPL.DefaultContext}}, alg::NUTS{Turing.Essential.EnzymeAD, (), AdvancedHMC.DiagEuclideanMetric}, N::Int64)
@ Turing.Inference ~/.julia/packages/Turing/PbWOa/src/inference/Inference.jl:130
[34] top-level scope
@ REPL[12]:1 |
I just checked, and now that #934 is merged I can run the example successfully again 🙂 I was able to repeat the sampling with 10000 samples ~ 20 times (then I became uninterested) and also sampling 1_000_000 samples in one call worked without issues. Should say though that I use Linux and don't have access to a Mac, so I don't know if the issues observed above are fixed. |
Great work -- I can confirm it runs successfully on my Mac too! |
I've now bumped the Enzyme patch version. Retry on full turing once that hits the general registry and let me know how it goes? Regardless, closing this issue as complete. |
I just checked again the model in TuringLang/Turing.jl#1887 (comment) on that branch, and it once again (after being fixed in #457) segfaults after emitting warnings. Below is the complete code sample:
The full stacktrace can be found at https://gist.github.com/sethaxen/5666e1c6c9d8194e0370c60eb70de49e#file-log-txt
It also fails on the latest release of Enzyme.
The text was updated successfully, but these errors were encountered: