Support compiling with clang cuda #1293

trws · 2022-07-11T23:16:08Z

The full summary is above.

Adds macros to identify the cuda compiler currently in use (see RAJA_CUDA_COMPILER_*
Adds a workaround for the unsigned long long __shfl(_sync)? intrinsics that have been broken in upstream clang apparently for years, fix already in submission upstream but we have a workaround to use now. (AFAICT, this bug has been there since we played with this in Stony Brook, infinite recursion for unsigned long long, still can't believe it, see here)
Flags known failures, which exactly match those for HIP. Presumably we violate the overload or declaration semantics clang requires somewhere, but aside from this everything basically works.

One note, if compiling with CUDA 10.1 such that the submodule version of cub is used, we must define -DCUB_USE_COOPERATIVE_GROUPS=1 because cub mis-identifies cuda as being a very old version and uses incorrect unsynchronized shuffles without it. Newer versions of cub have this fixed, but require use of a clang version at least 14, and the version currently installed on LC (14.0.4) was built without cuda support, so test with a bit of care.

include/RAJA/policy/cuda/reduce.hpp

MrBurmark · 2022-07-11T23:35:59Z

I can't believe shfl is still broken either, its been broken as long as I can remember. Does it make sense to put the shfl workaround in camp?

trws · 2022-07-11T23:37:28Z

The solution turned out to be trivial, and it's on its way in so it'll be in llvm 15 most likely. That said, there are a lot of older versions around, so if anything outside RAJA wants to use it I'd have no issue moving it over there.

trws · 2022-07-12T15:11:32Z

Upstream patch now up for review: https://reviews.llvm.org/D129536

trws · 2022-08-17T17:38:34Z

Assuming this passes, anyone willing to review/merge? As far as I know this is working, and the patch has been merged upstream.

rhornung67 · 2022-08-17T17:54:07Z

@trws we need to pull the branch from the fork into our repo and make a new PR for Gitlab CI to run

Co-authored-by: Jason Burmark <[email protected]>

trws requested review from davidbeckingsale, rhornung67 and ajkunen July 11, 2022 23:16

MrBurmark reviewed Jul 11, 2022

View reviewed changes

include/RAJA/policy/cuda/reduce.hpp Show resolved Hide resolved

rhornung67 mentioned this pull request Jul 20, 2022

long #include times for axom headers LLNL/axom#872

Open

trws and others added 4 commits October 19, 2022 09:29

add detection for specific cuda compilers

254bfaa

fix cuda shfl reduction on clang cuda

5231691

work around tests that are broken on both hip and cuda clang

ff1b9c4

Update include/RAJA/policy/cuda/reduce.hpp

cc9ce6f

Co-authored-by: Jason Burmark <[email protected]>

trws force-pushed the feature/trws/clang-cuda branch from 3b88c34 to cc9ce6f Compare October 19, 2022 16:32

trws mentioned this pull request Oct 19, 2022

Feature/trws/clang cuda #1350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support compiling with clang cuda #1293

Support compiling with clang cuda #1293

trws commented Jul 11, 2022

MrBurmark commented Jul 11, 2022

trws commented Jul 11, 2022

trws commented Jul 12, 2022

trws commented Aug 17, 2022

rhornung67 commented Aug 17, 2022

Support compiling with clang cuda #1293

Are you sure you want to change the base?

Support compiling with clang cuda #1293

Conversation

trws commented Jul 11, 2022

MrBurmark commented Jul 11, 2022

trws commented Jul 11, 2022

trws commented Jul 12, 2022

trws commented Aug 17, 2022

rhornung67 commented Aug 17, 2022