Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second order derivative crashes on CUDA GPU #2189

Open
utkinis opened this issue Dec 9, 2024 · 1 comment
Open

Second order derivative crashes on CUDA GPU #2189

utkinis opened this issue Dec 9, 2024 · 1 comment

Comments

@utkinis
Copy link

utkinis commented Dec 9, 2024

Description

Vaguely related to #2187 , trying to solve same problem but with pure Enzyme. This works on CPU, but on GPU Enzyme crashes.

MWE:

using Enzyme, StaticArrays, CUDA

d_dx(S::StaticMatrix{3,<:Any}) = S[SVector(2, 3), :] - S[SVector(1, 2), :]

F(x) = sum(d_dx(x))

# Working CPU version
function foo_cpu(A, B)
    Bl   = @SMatrix [B[1] for _ in 1:3, _ in 1:3]
    r̄, r = Enzyme.autodiff(Enzyme.ReverseWithPrimal, Const(F), Active, Active(Bl))
    A[1] = r / sum(r̄[1])
    return
end

A  = zeros(1)
B  = zeros(1)
dA = zeros(1)
dB = zeros(1)

foo_cpu(A, B)
Enzyme.autodiff(Enzyme.Reverse, Const(foo_cpu), Const, Duplicated(A, dA), Duplicated(B, dB)) # works

# Broken GPU version
function foo_gpu(A, B)
    Bl   = @SMatrix [B[1] for _ in 1:3, _ in 1:3]
    r̄, r = Enzyme.autodiff_deferred(Enzyme.ReverseWithPrimal, Const(F), Active, Active(Bl))
    A[1] = r / sum(r̄[1])
    return
end

function dfoo_gpu(A, B)
    Enzyme.autodiff_deferred(Enzyme.Reverse, Const(foo_gpu), Const, A, B)
    return
end

A  = CUDA.zeros(Float64, 1)
B  = CUDA.zeros(Float64, 1)
dA = CUDA.zeros(Float64, 1)
dB = CUDA.zeros(Float64, 1)

@cuda foo_gpu(A, B)
@cuda dfoo_gpu(Duplicated(A, dA), Duplicated(B, dB)) # crashes

Running this yields:

Error log

julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:5202: void AdjointGenerator::recursivelyHandleSubfunction(llvm::CallInst&, llvm::Function*, const std::vector<bool>&, bool, DIFFE_TYPE, bool): Assertion `whatType(argType, Mode) == DIFFE_TYPE::DUP_ARG || whatType(argType, Mode) == DIFFE_TYPE::CONSTANT' failed.

[2248482] signal (6.-6): Aborted
in expression starting at /scratch-1/iutkin/Glaide.jl/app/enzyme_mwe.jl:40
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f97b49cf728)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:5202
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6479
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4305
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:633
EnzymeCreatePrimalAndGradient at /home/iutkin/.julia/packages/Enzyme/6C71q/src/api.jl:268
jfptr_EnzymeCreatePrimalAndGradient_21516 at /home/iutkin/.julia/compiled/v1.10/Enzyme/G1p5n_Pi91X.so (unknown line)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
enzyme! at /home/iutkin/.julia/packages/Enzyme/6C71q/src/compiler.jl:1576
#codegen#18938 at /home/iutkin/.julia/packages/Enzyme/6C71q/src/compiler.jl:4425
codegen at /home/iutkin/.julia/packages/Enzyme/6C71q/src/compiler.jl:3223 [inlined]
#189 at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:224
get! at ./dict.jl:479
unknown function (ip: 0x7f965907e480)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
macro expansion at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:223 [inlined]
#emit_llvm#188 at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:108
unknown function (ip: 0x7f979d19a1c6)
unknown function (ip: 0x7f979d177779)
unknown function (ip: 0x7f979d17774f)
emit_llvm at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:106 [inlined]
#codegen#186 at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:100
codegen at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:82
unknown function (ip: 0x7f979d1777f9)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
#compile#185 at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:79
compile at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:74 [inlined]
#1145 at /home/iutkin/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:250 [inlined]
#JuliaContext#184 at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:34
unknown function (ip: 0x7f979d176a56)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
JuliaContext at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:25
compile at /home/iutkin/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:249
actual_compilation at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:237
unknown function (ip: 0x7f979d1765b9)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
cached_compilation at /home/iutkin/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:151
macro expansion at /home/iutkin/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:380 [inlined]
macro expansion at ./lock.jl:267 [inlined]
#cufunction#1169 at /home/iutkin/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:375
cufunction at /home/iutkin/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:372
unknown function (ip: 0x7f9659087a43)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_body at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:489
jl_interpret_toplevel_thunk at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2139
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
_include at ./loading.jl:2199
include at ./client.jl:494
unknown function (ip: 0x7f979d129eb5)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_92142.1 at /home/iutkin/.julia/juliaup/julia-1.10.7+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
#1014 at ./client.jl:437
jfptr_YY.1014_83099.1 at /home/iutkin/.julia/juliaup/julia-1.10.7+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:421
exec_options at ./client.jl:338
_start at ./client.jl:557
jfptr__start_83125.1 at /home/iutkin/.julia/juliaup/julia-1.10.7+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/tester-amdci4-10/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 25071649 (Pool: 25019003; Big: 52646); GC: 33
Aborted (core dumped)

Reproducibility

julia> versioninfo()
Julia Version 1.10.7
Commit 4976d05258e (2024-11-26 15:57 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD EPYC 7282 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 4 default, 0 interactive, 2 GC (on 32 virtual cores)
Environment:
  JULIA_HDF5_PATH = /scratch-1/soft/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.4.0/hdf5-1.14.3-hzicuphosrqicmxhl66fzdjdkjkmzxdy
  JULIA_LOAD_PATH = :/scratch-1/julia_prefs/:/scratch-1/julia_prefs/
  JULIA_NUM_THREADS = 4
julia> CUDA.versioninfo()
CUDA runtime 12.3, local installation
CUDA driver 12.3
NVIDIA driver 545.23.8

CUDA libraries: 
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 2023.3.1 (API 21.0.0)
- NVML: 12.0.0+545.23.8

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0
- CUDA_Runtime_Discovery: 0.3.5

Toolchain:
- Julia: 1.10.7
- LLVM: 15.0.7

Preferences:
- CUDA_Runtime_jll.version: local

8 devices:
  0: NVIDIA A100-SXM4-40GB (sm_80, 39.385 GiB / 40.000 GiB available)
  1: NVIDIA A100-SXM4-40GB (sm_80, 39.385 GiB / 40.000 GiB available)
  2: NVIDIA A100-SXM4-40GB (sm_80, 30.848 GiB / 40.000 GiB available)
  3: NVIDIA A100-SXM4-40GB (sm_80, 30.848 GiB / 40.000 GiB available)
  4: NVIDIA A100-SXM4-40GB (sm_80, 30.823 GiB / 40.000 GiB available)
  5: NVIDIA A100-SXM4-40GB (sm_80, 30.848 GiB / 40.000 GiB available)
  6: NVIDIA A100-SXM4-40GB (sm_80, 30.848 GiB / 40.000 GiB available)
  7: NVIDIA A100-SXM4-40GB (sm_80, 39.385 GiB / 40.000 GiB available)

Manifest.toml.txt

@wsmoses
Copy link
Member

wsmoses commented Dec 15, 2024

yeah this related to GPUCompiler deferred codegen not nicely handling several nestings deep [which @vchuravy has been working on resolving]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants