-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceil or % error in cuda backpropagation #1259
Comments
This should hopefully fix the ceil one: EnzymeAD/Enzyme#1647 The mod one is more strange. |
Thank toy for fast response ! @wsmoses |
You need to either build Enzyme from source https://github.com/EnzymeAD/Enzyme.jl/blob/main/deps/build_local.jl |
So the Enzyme_jll has now been landed so you should see the fix if you use Enzyme.jl#main. However I cannot reproduce your issue locally, instead running into:
|
Thanks! On my device problems also occur Can you check on your device, in my case rounding works; round(x+0.5) also works which is basically equivalent to ceil
into
or
|
Can you past the error messages from what arises on your system? |
Of course
|
And what version of Enzyme.jl are you on (since you need main what commit), and can you also show the output of st? Specifically the thing to check for is being on Enzyme_jll 0.0.99 |
I reloaded it by pkg add url="..."
|
Hm, since I have trouble reproducing this, can you make a MWE with just
autodiff and the kernel call (aka no Lux/etc)?
…On Sat, Feb 10, 2024 at 12:05 PM Jakub Mitura ***@***.***> wrote:
I reloaded it by pkg add url="..."
***@***.***) pkg> status Enzyme
Status `~/.julia/environments/v1.10/Project.toml`
[7da242da] Enzyme v0.11.15 `https://github.com/EnzymeAD/Enzyme.jl.git#main` <https://github.com/EnzymeAD/Enzyme.jl.git#main>
—
Reply to this email directly, view it on GitHub
<#1259 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXBTLT4TFY2WNFLT4R3YS6SF5AVCNFSM6AAAAABCPQYJ76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXGA3DIMRRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
with code like below
error is
|
@jakubMitura14 not quite, I was thinking something more like this (which basically is just the Enzyme autodiff call): using CUDA, Enzyme, Random
Enzyme.API.printall!(true)
Nx, Ny, Nz = 8, 8, 8
threads = (2,2,2)
blocks = (1, 1, 1)
#### main dummy kernel
function testKern(prim_A,A, p, Aout,Nx)
#adding one bewcouse of padding
x = (threadIdx().x + ((blockIdx().x - 1) * CUDA.blockDim_x()))
y = (threadIdx().y + ((blockIdx().y - 1) * CUDA.blockDim_y()))
z = (threadIdx().z + ((blockIdx().z - 1) * CUDA.blockDim_z()))
Aout[(x-1)*4+(y-1)*2+z]=(ceil(A[x,y,z]))
return nothing
end
function testKernDeff( prim_A,dprim_A,A, dA, p
, dp, Aout
, dAout,Nx)
Enzyme.autodiff_deferred(Reverse,testKern, Const, Duplicated(prim_A, dprim_A),Duplicated(A, dA), Duplicated(p, dp), Duplicated(Aout, dAout),Const(Nx) )
return nothing
end
function calltestKern(prim_A,A, p,Nx)
Aout = CUDA.zeros(Float32,8)
@cuda threads = threads blocks = blocks testKern(prim_A, A, p, Aout,Nx)
return Aout
end
prim_A = CUDA.zeros(Float32, Nx, Ny, Nz)
dprim_A = CUDA.zeros(Float32, Nx, Ny, Nz)
A = CUDA.zeros(Float32, Nx, Ny, Nz)
dA = CUDA.zeros(Float32, Nx, Ny, Nz)
p = CUDA.zeros(Float32, Nx, Ny, Nz)
dp = CUDA.zeros(Float32, Nx, Ny, Nz)
Aout = CUDA.zeros(Float32,8)
dAout = CUDA.zeros(Float32,8)
@cuda threads = threads blocks = blocks testKernDeff(prim_A,dprim_A, A, dA, p, dp, Aout, dAout,Nx) |
Thanks for example! Executing your code gave :
|
Should be fixed by #1281 please reopen if not. |
Hello I have some strange issue when differentiating through CUDA kernel; kernel executes ok but during backprop if I use in kernel ceil floor or "%" operation or
the application breaks - Hovewer when I use round all compiles without problem.
Usually Julia just crashes; or reports issues with garbage collector
I use Julia 1.10 Ubuntu 20.04 RTX 3090 gpu and fresh installation of new CUDA (v5.1.2) and Enzyme (v0.11.12)
code to reproduce
using ceil function in kernel give error like below
using x%1 in kernel gives error like below
so changing line
into
changing line
into
and all is working
The text was updated successfully, but these errors were encountered: