Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Recursion #26

Open
a-camuto opened this issue Nov 22, 2016 · 3 comments
Open

GPU Recursion #26

a-camuto opened this issue Nov 22, 2016 · 3 comments

Comments

@a-camuto
Copy link

Is it possible to use GPU computing for recursive processes, even if the kernel isn't being called recursively ? I keep getting a segmentation error when trying to write from my output buffer to my output vector...

@m8pple
Copy link
Contributor

m8pple commented Nov 22, 2016

Generally speaking you can't do recursion in the GPUs as they don't have a stack (although this
is less true in newer GPUs, especially nvidia ones).

Hoever, it sounds like only the thing calling the kernel is recursive, which should
be completely fine. It's probably a more normal bug.

Are you absolutely sure that your buffer is of the correct size, and that your kernel is
not reading/writing off the end of the buffer?

@a-camuto
Copy link
Author

The sizing seems correct, but let's say the kernel calls on a recursive function within its body, is that still feasible ?

@m8pple
Copy link
Contributor

m8pple commented Nov 22, 2016

Ah - the ban on recursion within the kernel covers both direct and indirect
recursion. So within the kernel you can call any number of functions, as
long as none of them calls themselves, either directly or indirectly.

If you rememer the rough sketch of how the GPU does parallelism from the
lectures, each work-item maintains its state using registers. As long as there
is no recursion this works fine, as the compiler will effectively inline all the
functions called by the kernel (it won't always inline, but the effect is the same).

Once you have any kind of recursion, then there needs to be some kind of
stack. However, that stack is then per work-item, which means that you can
end up with huge numbers of memory reads/writes for each function call in
the kernel, and you need somewhere to store all those stacks. So the original
approach in OpenCL was to ban recursion. However, a lot of OpenCL drivers
appear to allow recursion, then just crash at run-time - I'm not sure whether
that behaviour is in-spec or not, probably they should give an error when
you try to compile the kernel.

Note that if you use CUDA, rather than OpenCL, the recursion is available
to you, as the newer GPUs can support recursion. This is the general
tradeoff of platform-specific versus general-purpose APIS, where you
end up limited to the lower-common-demoninator feature-set.

Though the AWS part does support CUDA, so if you want to go in that
direction I wouldn't mind - just requires refining the environment spec
a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants