-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support compiling with clang cuda #1293
base: develop
Are you sure you want to change the base?
Conversation
I can't believe shfl is still broken either, its been broken as long as I can remember. Does it make sense to put the shfl workaround in camp? |
The solution turned out to be trivial, and it's on its way in so it'll be in llvm 15 most likely. That said, there are a lot of older versions around, so if anything outside RAJA wants to use it I'd have no issue moving it over there. |
Upstream patch now up for review: https://reviews.llvm.org/D129536 |
Assuming this passes, anyone willing to review/merge? As far as I know this is working, and the patch has been merged upstream. |
@trws we need to pull the branch from the fork into our repo and make a new PR for Gitlab CI to run |
3b88c34
to
cc9ce6f
Compare
The full summary is above.
RAJA_CUDA_COMPILER_*
One note, if compiling with CUDA 10.1 such that the submodule version of cub is used, we must define
-DCUB_USE_COOPERATIVE_GROUPS=1
because cub mis-identifies cuda as being a very old version and uses incorrect unsynchronized shuffles without it. Newer versions of cub have this fixed, but require use of a clang version at least 14, and the version currently installed on LC (14.0.4) was built without cuda support, so test with a bit of care.