GPU Recursion #26

a-camuto · 2016-11-22T19:53:23Z

Is it possible to use GPU computing for recursive processes, even if the kernel isn't being called recursively ? I keep getting a segmentation error when trying to write from my output buffer to my output vector...

m8pple · 2016-11-22T20:08:56Z

Generally speaking you can't do recursion in the GPUs as they don't have a stack (although this
is less true in newer GPUs, especially nvidia ones).

Hoever, it sounds like only the thing calling the kernel is recursive, which should
be completely fine. It's probably a more normal bug.

Are you absolutely sure that your buffer is of the correct size, and that your kernel is
not reading/writing off the end of the buffer?

a-camuto · 2016-11-22T22:05:04Z

The sizing seems correct, but let's say the kernel calls on a recursive function within its body, is that still feasible ?

m8pple · 2016-11-22T23:13:42Z

Ah - the ban on recursion within the kernel covers both direct and indirect
recursion. So within the kernel you can call any number of functions, as
long as none of them calls themselves, either directly or indirectly.

If you rememer the rough sketch of how the GPU does parallelism from the
lectures, each work-item maintains its state using registers. As long as there
is no recursion this works fine, as the compiler will effectively inline all the
functions called by the kernel (it won't always inline, but the effect is the same).

Once you have any kind of recursion, then there needs to be some kind of
stack. However, that stack is then per work-item, which means that you can
end up with huge numbers of memory reads/writes for each function call in
the kernel, and you need somewhere to store all those stacks. So the original
approach in OpenCL was to ban recursion. However, a lot of OpenCL drivers
appear to allow recursion, then just crash at run-time - I'm not sure whether
that behaviour is in-spec or not, probably they should give an error when
you try to compile the kernel.

Note that if you use CUDA, rather than OpenCL, the recursion is available
to you, as the newer GPUs can support recursion. This is the general
tradeoff of platform-specific versus general-purpose APIS, where you
end up limited to the lower-common-demoninator feature-set.

Though the AWS part does support CUDA, so if you want to go in that
direction I wouldn't mind - just requires refining the environment spec
a bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Recursion #26

GPU Recursion #26

a-camuto commented Nov 22, 2016

m8pple commented Nov 22, 2016

a-camuto commented Nov 22, 2016

m8pple commented Nov 22, 2016

GPU Recursion #26

GPU Recursion #26

Comments

a-camuto commented Nov 22, 2016

m8pple commented Nov 22, 2016

a-camuto commented Nov 22, 2016

m8pple commented Nov 22, 2016