Skip to content

Commit

Permalink
Hommexx: Limit HIP team size in debug builds.
Browse files Browse the repository at this point in the history
Also clean up the team_num_threads_vectors routine to consolidate
the rules.

Fixes SCREAM issue 2058.

[BFB]
  • Loading branch information
ambrad committed Nov 28, 2022
1 parent fb7d271 commit 7c95fc7
Showing 1 changed file with 14 additions and 22 deletions.
36 changes: 14 additions & 22 deletions components/homme/src/share/cxx/ExecSpaceDefs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,39 +173,31 @@ team_num_threads_vectors (const int num_parallel_iterations,
// fewer than 4 warps/thread block limits the thread occupancy to that
// number/4. That seems to be in Cuda specs, but I don't know of a function
// that provides this number. Use a configuration option that defaults to 4.
const int min_num_warps = HOMMEXX_CUDA_MIN_WARP_PER_TEAM;
int min_num_warps = HOMMEXX_CUDA_MIN_WARP_PER_TEAM;

int max_num_warps = HOMMEXX_CUDA_MAX_WARP_PER_TEAM;
#ifdef KOKKOS_ENABLE_DEBUG
// In debug builds, team size must be smaller because of Kokkos-side data.
max_num_warps = std::min(max_num_warps, 8);
#endif

#ifdef KOKKOS_ENABLE_CUDA
const int num_warps_device = Kokkos::Impl::cuda_internal_maximum_concurrent_block_count();
const int num_threads_warp = Kokkos::Impl::CudaTraits::WarpSize;
// The number on P100 is 16, but for some reason
// Kokkos::Impl::cuda_internal_maximum_grid_count() returns 8. I may be
// misusing the function. I have an open issue with the Kokkos team to resolve
// this. For now:
# ifdef KOKKOS_ENABLE_DEBUG
// Work around spurious "terminate called after throwing an instance
// of 'std::runtime_error'" message.
const int max_num_warps = 8;
# else
const int max_num_warps = HOMMEXX_CUDA_MAX_WARP_PER_TEAM; //Kokkos::Impl::cuda_internal_maximum_grid_count();
# endif

#elif defined(KOKKOS_ENABLE_HIP)

//use 64 wavefronts per CU and 120 CUs
// Use 64 wavefronts per CU and 120 CUs.
const int num_warps_device = 120*64; // no such thing Kokkos::Impl::hip_internal_maximum_warp_count();
const int max_num_warps = HOMMEXX_CUDA_MAX_WARP_PER_TEAM;
const int num_threads_warp = Kokkos::Experimental::Impl::HIPTraits::WarpSize;

#else

// I want thread-distribution rules to be unit-testable even when Cuda is
// off. Thus, make up a P100-like machine:
// I want thread-distribution rules to be unit-testable even when GPU spaces
// are off. Thus, make up a GPU-like machine:
const int num_warps_device = 1792;
const int num_threads_warp = 32;
const int max_num_warps = 16;

max_num_warps = 16;
#endif

min_num_warps = std::min(min_num_warps, max_num_warps);

return Parallel::team_num_threads_vectors_for_gpu(
num_warps_device, num_threads_warp,
min_num_warps, max_num_warps,
Expand Down

0 comments on commit 7c95fc7

Please sign in to comment.