Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compiling with clang cuda #1293

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

trws
Copy link
Member

@trws trws commented Jul 11, 2022

The full summary is above.

  • Adds macros to identify the cuda compiler currently in use (see RAJA_CUDA_COMPILER_*
  • Adds a workaround for the unsigned long long __shfl(_sync)? intrinsics that have been broken in upstream clang apparently for years, fix already in submission upstream but we have a workaround to use now. (AFAICT, this bug has been there since we played with this in Stony Brook, infinite recursion for unsigned long long, still can't believe it, see here)
  • Flags known failures, which exactly match those for HIP. Presumably we violate the overload or declaration semantics clang requires somewhere, but aside from this everything basically works.

One note, if compiling with CUDA 10.1 such that the submodule version of cub is used, we must define -DCUB_USE_COOPERATIVE_GROUPS=1 because cub mis-identifies cuda as being a very old version and uses incorrect unsynchronized shuffles without it. Newer versions of cub have this fixed, but require use of a clang version at least 14, and the version currently installed on LC (14.0.4) was built without cuda support, so test with a bit of care.

@MrBurmark
Copy link
Member

I can't believe shfl is still broken either, its been broken as long as I can remember. Does it make sense to put the shfl workaround in camp?

@trws
Copy link
Member Author

trws commented Jul 11, 2022

The solution turned out to be trivial, and it's on its way in so it'll be in llvm 15 most likely. That said, there are a lot of older versions around, so if anything outside RAJA wants to use it I'd have no issue moving it over there.

@trws
Copy link
Member Author

trws commented Jul 12, 2022

Upstream patch now up for review: https://reviews.llvm.org/D129536

@trws
Copy link
Member Author

trws commented Aug 17, 2022

Assuming this passes, anyone willing to review/merge? As far as I know this is working, and the patch has been merged upstream.

@rhornung67
Copy link
Member

@trws we need to pull the branch from the fork into our repo and make a new PR for Gitlab CI to run

@trws trws force-pushed the feature/trws/clang-cuda branch from 3b88c34 to cc9ce6f Compare October 19, 2022 16:32
@trws trws mentioned this pull request Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants