Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHOC support for small kernels #1940

Merged
merged 19 commits into from
Oct 13, 2022
Merged

SHOC support for small kernels #1940

merged 19 commits into from
Oct 13, 2022

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Sep 13, 2022

This is a major refactor of SHOC to support small kernels which is needed on frontier/crusher.

The only significant challenge so far has been how to maintain workspaces across kernels. This is a new challenge for us because all previous implementation were monolithic kernels. The approach that appears to be working the best is to use integer slot-ids as a handle to the workspace.

Another challenge is that temporary scalars in the monolithic kernel have to become views of schol scalars for small kernels.

I've only transitioned a couple SHOC functions to supporting small kernels. I wanted to get approval for the pattern I'm using before doing the rest since that's going to be a considerable amount of work.

There a few minor changes needed to EKAT in order for this to build that you can't see here. The only interesting change was the addition of this WSM::Workspace method:

template <typename T, typename D>
template <typename S>
KOKKOS_FORCEINLINE_FUNCTION
Unmanaged<typename WorkspaceManager<T, D>::template view_1d<S> >
WorkspaceManager<T, D>::Workspace::get_space_in_slot(
  const int slot) const
{
  EKAT_KERNEL_ASSERT_MSG(slot < m_parent.m_max_used, "Asked for out-of-bounds slot\n");
  auto result = m_parent.get_space_in_slot<S>(m_ws_idx, slot);
  EKAT_KERNEL_ASSERT_MSG(is_active<S>(result), "Asked for inactive slot\n");

  return result;
}

Fixes #1903

// Need a second workspace for storing shcol local scalars. Need workspaces
// of size schol, not nlev, to store temporary scalars for small kernels
const auto policy = ekat::ExeSpaceUtils<ExeSpace>::get_default_team_policy(shcol, 1);
ScalarWorkspaceMgr wsm_local(shcol, num_local_scalars, policy, 1.e10); // huge overprovision => no ws sharing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both the workspace size and type are different, I decided to make a new WSM for managing local scalars instead of trying to leverage the existing WSM.

@@ -192,6 +197,10 @@ set(SCREAM_LIB_ONLY ${DEFAULT_LIB_ONLY} CACHE BOOL "Only build libraries, no exe
set(NetCDF_Fortran_PATH ${DEFAULT_NetCDF_Fortran_PATH} CACHE FILEPATH "Path to netcdf fortran installation")
set(NetCDF_C_PATH ${DEFAULT_NetCDF_C_PATH} CACHE FILEPATH "Path to netcdf C installation")
set(SCREAM_MACHINE ${DEFAULT_SCREAM_MACHINE} CACHE STRING "The CIME/SCREAM name for the current machine")
set(SCREAM_MONOLITHIC_KERNELS ${DEFAULT_MONOLITHIC_KERNELS} CACHE STRING "Use monolithic kokkos kernels")
if (NOT SCREAM_MONOLITHIC_KERNELS)
set(EKAT_DISABLE_WORKSPACE_SHARING TRUE CACHE STRING "")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is needed or even a good design choice. I think we should leave it to SCREAM to set up their WSMs properly. Any WSM that might be used for small kernels should have their over provision factor set to a huge number. This will prevent any workspace sharing. The top level functions of small kernels should assert that their WSMs have one slot per team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'm not 100% sure what this option would do in EKAT. But I feel like we don't want to set global ekat configurations just b/c a few use cases need it. If we can make SCREAM create WSMs differently, each with its own config/ctor params, I think that would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EKAT's WSM should support a new option to make the workspace size proportional to league size (total number of physics columns) rather than use the overprovision factor. This will follow what Hommexx does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambrad , that's easily doable. I agree that I am abusing the overprovision argument here.

# Add dispatch source files if monolithic kernels are off
if (NOT SCREAM_MONOLITHIC_KERNELS)
list(APPEND SHOC_SRCS
shoc_energy_integrals_disp.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want to do it in this PR, and it might be a personal taste thing, but in case you agree: the shoc (and p3) folders are quite big, given the large number of _impl.cpp and .cpp files (96 files in the shoc folder). This makes navigating the repo a bit hard (imho). I would vote for adding subfolders: one impl subfolder for the impl files, one eti subfolder for the ETI cpp files, and one dispatch for the new files you are adding. That will also allow the less experienced users to recognize the difference between eti and dispatch *.cpp files...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This directory now has 98 files and will be growing to near 150 when this is done.


template<>
void Functions<Real,DefaultDevice>
::shoc_energy_fixer_disp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal taste here would be for spelling out dispatch, but that's very subjective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those extra 4 keystrokes would give me carpal tunnel!

const Int& nadv,
const view_2d<const Spack>& zt_grid,
const view_2d<const Spack>& zi_grid,
const Int& se_b_slot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could group these slot indices in a verbose struct, like shoc::SmallKernelsSlotsIds or something.

@@ -611,6 +659,65 @@ struct Functions
const uview_1d<Spack>& wqls_sec,
const uview_1d<Spack>& brunt,
const uview_1d<Spack>& isotropy);
#else
static void shoc_main_internal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could give these functions two different names, like shoc_main_monolithic and shoc_main_multiple_kernels, or something.

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall looks fine. My biggest concern is potential confusion from code duplication and bloating of files. Which is why at some point (perhaps at the end of the refactor), I would prefer if we spend a tiny bit of effort on organizing code/files in a self-documenting way.

@ambrad
Copy link
Member

ambrad commented Sep 13, 2022

Good start. I'd like to offer two alternatives for handling persistent workspace slots.

(1) Rather than getting pointers (*_slot indices) and passing those around, make a setup/teardown pair of routines that just does the usual thing, e.g.

  workspace.template take_many_and_reset<5>(
    {"rho_zt", "shoc_qv", "dz_zt", "dz_zi", "tkh"},
    {&rho_zt, &shoc_qv, &dz_zt, &dz_zi, &tkh});

and then releases it all at the end of the kernel. Each small kernel always calls this setup/teardown pair and thus always gets the right persistent memory for each variable. You could package the persistent arrays in a structure. This pattern would be useful in the case of more complicated partial-persistence patterns where the setup/teardown would change throughout the parameterization. SHOC doesn't seem to need that, though. But this pattern is what I had in mind in #1903, where I was thinking about the general case.

(2) Instead, this immediately suggests a simpler option for SHOC: In the case of nonmono kernels, just have one additional buffer struct for these arrays, and when requesting buffer space, include this additional buffer struct in the request. Then no extra workspace gymnastics are needed in SHOC.

For the scalars, I recommend making an array to hold all scalars and then providing an enum in the new buffer struct that indexes this array, like this:

enum : int { se_b = 0, ke_b, wv_b, ... };
view1d<Scalar> scals;
// usage
b.scals[b.se_b]

@bartgol
Copy link
Contributor

bartgol commented Sep 13, 2022

I'm not a fan of the setup/teardown approach, which relies on the order of fields to be always the same. It is a somewhat subtle assumption, which has the risk of being forgotten 3yy down the road when another developer is maybe adding a feature to shoc. Of course, inline documentation would help, but it would still be a bit fragile.

Option 2 seems more attractive. If I understand correctly, you want to use "normal" buffer views for persistent data, without using WSM at all, right? Then, inside each top-level kernel, you would subview those buffers, rather than grab slots from the WSM. Imho, this is much simpler than hacking the WSM to get non-scratch memory.

@ambrad
Copy link
Member

ambrad commented Sep 13, 2022

@bartgol re: option 2, yes. Since it's all we need for SHOC, we can table discussion of more complicated temp-buffer use cases.

@jgfouca
Copy link
Member Author

jgfouca commented Sep 14, 2022

@ambrad , @bartgol , OK, see if you like this better. I agree that, if you aren't sharing or releasing workspaces, then it raises the question of why use the WSM at all? The code is definitely simpler without it.

const view_2d<Spack>& isotropy)
{
// Create space for temporary scalars
view_1d<Scalar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These views shouldn't be allocated each time SHOC is called. They should be part of the init-time allocation using buffer memory. I suggest using a struct to hold these.

@jgfouca
Copy link
Member Author

jgfouca commented Sep 19, 2022

@ambrad, @bartgol , see if you like this latest commit. I know that I still need to reorganize the directory (impls, eti, and dispatch subdirectories).

@@ -2693,7 +2693,12 @@ int shoc_init_f(Int nlev, Real *pref_mid, Int nbot_shoc, Int ntop_shoc)
ekat::host_to_device({pref_mid}, nlev, temp_d);
view_1d pref_mid_d(temp_d[0]);

#ifndef SCREAM_MONOLITHIC_KERNELS
SHF::SHOCTemporaries temporaries; // we won't keep these
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little hacky but fine IMO. The f90->cxx bridge is already slow so it doesn't matter if we reallocate temporaries for each shoc_main_f call.

shoc_history_output.shoc_mix, shoc_history_output.w_sec, shoc_history_output.thl_sec, shoc_history_output.qw_sec, shoc_history_output.qwthl_sec, // Diagnostic Output Variables
shoc_history_output.wthl_sec, shoc_history_output.wqw_sec, shoc_history_output.wtke_sec, shoc_history_output.uw_sec, shoc_history_output.vw_sec, // Diagnostic Output Variables
shoc_history_output.w3, shoc_history_output.wqls_sec, shoc_history_output.brunt, shoc_history_output.isotropy, // Diagnostic Output Variables
shoc_temporaries.se_b,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the existing pattern of unpacking the struct before calling shoc_main_internal even though this function has a crazy number of arguments.

@@ -58,6 +63,33 @@ Int Functions<S,D>::shoc_init(
const auto host_view = Kokkos::create_mirror_view(npbl_d);
Kokkos::deep_copy(host_view, npbl_d);

// Allocate temporaries if using small kernels
#ifndef SCREAM_MONOLITHIC_KERNELS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this part is not following the established pattern for processes to allocate workspace. I believe this code should be in SHOCMacrophysics::init_buffers and be allocated using the buffer_manager.get_memory() pointer, as is done for m_buffer. The reason for this is then no process is holding onto a bunch of device memory that can't be used by any other process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambrad , thanks, I was wondering if I should use that (Buffer). I have pushed a commit that does this.

m_buffer.zi_grid = decltype(m_buffer.zi_grid)(s_mem, m_num_cols, nlevi_packs);
s_mem += m_buffer.zi_grid.size();
using spack_2d_view_t = decltype(m_buffer.z_mid);
spack_2d_view_t* _2d_spack_mid_view_ptrs[Buffer::num_2d_vector_mid] = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the opportunity to clean up some of the repetitive code in this method. The changes to the 1d code should be semantically identical to the previous impl. The 2d code does not behave exactly as before because the order of the views in the buffer is different with all the "mid" views and "int" views grouped separately now. I assume this order doesn't matter. If it does, I will refactor this to preserve the original order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambrad
Copy link
Member

ambrad commented Sep 20, 2022

@jgfouca, after the last changes, this code looks great. Are you wanting to merge this PR, or are you looking for approval from everyone following our discussions above and then you'll continue working on this branch using the patterns you established?

@jgfouca
Copy link
Member Author

jgfouca commented Sep 20, 2022

@ambrad , thanks for the good feedback.

Are you wanting to merge this PR, or are you looking for approval from everyone following our discussions above and then you'll continue working on this branch using the patterns you established?

I'm going to continue working on this using the established pattern.

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

@bartgol this is an example of where a global var shared among all threads in a team will lead to an error:
https://github.com/E3SM-Project/EKAT/blob/dbe3d1c2e3d6242ce998865b2d4676676d80bb69/src/ekat/kokkos/ekat_kokkos_utils.hpp#L98
This should be protected by a single in that case.

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

@bartgol this is an example of where a global var shared among all threads in a team will lead to an error: https://github.com/E3SM-Project/EKAT/blob/dbe3d1c2e3d6242ce998865b2d4676676d80bb69/src/ekat/kokkos/ekat_kokkos_utils.hpp#L98 This should be protected by a single in that case.

Yes, that's a pb. We also have another += on the shared var in the view_reduction function, for Serialize=false. But for Serialize=true, all += on shared vars are only performed by team rank 0.

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

Yes, that's a pb.

But note that the case of an auto var passed to view_reduction must also be handled; a single won't work there. Some thought needs to be given to handling both situations.

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

For clarity, I tried to add team_barrier() and team_broadcast(result,0) calls all over the two fcns, and changed PerThread into PerTeam, to no avail. I don't see where += can be performed by 2+ threads on the shared mem var. For the case Serialize=true, I mean.

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

Also, tbc, I'm fine with Jim's mod. I just want to understand where the shared memory was causing a problem with Serialize=true.

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 3243
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 3860
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Blake

  • Build Num: 3978
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: jgfouca/shoc_small_kernels
  • SHA: f46223c
  • Mode: TEST_REPO

Pull Request Author: jgfouca

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

@bartgol re: Serialize, e.g. here: https://github.com/E3SM-Project/EKAT/blob/dbe3d1c2e3d6242ce998865b2d4676676d80bb69/src/ekat/kokkos/ekat_kokkos_utils.hpp#L155

Yes, that's the one I was referring to with "We also have another += on the shared var in the view_reduction function". But that code only happens if Serialize=false. If Serialize=true, none of the += on shared memory happens.

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

Ok, what about E3SM-Project/EKAT@dbe3d1c/src/ekat/kokkos/ekat_kokkos_utils.hpp#L133? However, this applies only if pack size > 1.

Yes, that is a problem, since all team members execute that line. But I tried to switch PerThread with PerTeam, adding a team.team_broadcast(result,0) right after the single, and still got non bfb.

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

Ah, ok, maybe this somewhat more subtle situation: In parallel_reduce, thread 1 writes to result before thread 2 executes local_tmp = result.

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

Ah, ok, maybe this somewhat more subtle situation: In parallel_reduce, thread 1 writes to result before thread 2 executes local_tmp = result.

Maybe. I stuffed the function with plenty of team_barrier() calls though, so I am not really sure. I will raise my hands and give up. Though I think we should fix this in ekat, to avoid doing the same mistake again somewhere else.

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

Uhm, I modified view_reduction to do

{
  auto temp = result;
  [same code but updating temp instead of result]
  result = temp;
}

and called it passing se_int (which is from a view) as arg for result, but it's not passing the tests. I think this should be the same as doing

Scalar se_tmp(0);
view_reduction(...,se_tmp);
se_int = se_tmp;

with the original code, no?

@ambrad
Copy link
Member

ambrad commented Oct 12, 2022

Luca, I don't think it's the same. I believe the thread race condition I mentioned is the key here. In the first case you quote, temp = result in thread 2 could be executed after thread 1 has set result = temp. In the second case you quote, se_tmp is not set to se_int initially; rather, it's set to 0. Thus, no race condition occurs.

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 3243
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 3860
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Blake

  • Build Num: 3978
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;CIME
PULLREQUESTNUM 1940
SCREAM_SOURCE_BRANCH jgfouca/shoc_small_kernels
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA f46223c
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 6b9360e
TEST_REPO_ALIAS SCREAM

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@E3SM-Bot
Copy link
Collaborator

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@bartgol
Copy link
Contributor

bartgol commented Oct 12, 2022

Luca, I don't think it's the same. I believe the thread race condition I mentioned is the key here. In the first case you quote, temp = result in thread 2 could be executed after thread 1 has set result = temp. In the second case you quote, se_tmp is not set to se_int initially; rather, it's set to 0. Thus, no race condition occurs.

Mmm, ok, but then adding a team_barrier() after temp=result should restore bfb, while it doesn't.

Anyway, I should prob stop worring about this. I was just trying to emulate Jim's fix inside EKAT, so that we would not have this issue again in the future when calling view_reduction with a shared mem output variable.

Edit: nvm, I was just doing result=temp inside the has_garbage_end block, which is not always executed. I think I'd like to make the view_reduction fcn robust in ekat. SHOC can, of course, still create the temporary, but it should not need to.

@jgfouca
Copy link
Member Author

jgfouca commented Oct 13, 2022

@bartgol , @ambrad , thanks for the deep dive on view_reduction. I made an EKAT issue a while back E3SM-Project/EKAT#254 so maybe we should document some of our discoveries there.

In the meantime, can one of you approve this PR?

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ bartgol ]!

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged

@E3SM-Bot E3SM-Bot merged commit 6ade98c into master Oct 13, 2022
@E3SM-Bot E3SM-Bot deleted the jgfouca/shoc_small_kernels branch October 13, 2022 15:41
@E3SM-Bot
Copy link
Collaborator

Merge on Pull Request# 1940: IS A SUCCESS - Pull Request successfully merged

@ambrad
Copy link
Member

ambrad commented Oct 13, 2022

#1797 is using this new capability; see #1797 (comment) for a preliminary report of success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CIME Concerns infrastructure related to CIME
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for both monolithic and many small kernels to address Crusher/Frontier challenges.
5 participants