From 067fa9301bd2598fc66e2db91391c2e883ef565b Mon Sep 17 00:00:00 2001 From: mahf708 Date: Sat, 9 Nov 2024 13:13:53 -0600 Subject: [PATCH 1/2] update pam to match new p3 signature --- components/eam/src/physics/crm/pam/external | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/components/eam/src/physics/crm/pam/external b/components/eam/src/physics/crm/pam/external index c3b6522c57b7..1c37054d1ff9 160000 --- a/components/eam/src/physics/crm/pam/external +++ b/components/eam/src/physics/crm/pam/external @@ -1 +1 @@ -Subproject commit c3b6522c57b754d073e1deaad5ce8125f7f88325 +Subproject commit 1c37054d1ff9b160290cc286dcbd3cdc6cd7e7f6 From 4cf27bcb7fc4cca58e02c34a6d348797537035a5 Mon Sep 17 00:00:00 2001 From: mahf708 Date: Sat, 9 Nov 2024 13:35:11 -0600 Subject: [PATCH 2/2] fix formatting of changed md files --- components/eamxx/docs/developer/ci_nightly.md | 6 +- .../eamxx/docs/developer/cime_testing.md | 33 +-- components/eamxx/docs/developer/field.md | 95 +++++---- components/eamxx/docs/developer/grid.md | 43 ++-- components/eamxx/docs/developer/index.md | 2 - components/eamxx/docs/developer/io.md | 6 +- .../eamxx/docs/developer/kokkos_ekat.md | 196 ++++++++++++------ components/eamxx/docs/developer/managers.md | 2 +- components/eamxx/docs/developer/processes.md | 147 +++++++------ .../eamxx/docs/developer/source_tree.md | 1 - .../docs/developer/standalone_testing.md | 37 ++-- .../eamxx/docs/developer/style_guide.md | 1 - 12 files changed, 345 insertions(+), 224 deletions(-) diff --git a/components/eamxx/docs/developer/ci_nightly.md b/components/eamxx/docs/developer/ci_nightly.md index b222139dd557..0716ce4c9f3e 100644 --- a/components/eamxx/docs/developer/ci_nightly.md +++ b/components/eamxx/docs/developer/ci_nightly.md @@ -1,17 +1,17 @@ # Continuous Integration and Nightly Testing -## Autotester ## +## Autotester EAMxx using github actions and a Sandia product called Autotester 2 to run CI testing on a CPU and GPU machine for every github pull request. By default, we run the e3sm_scream_v1_at suite and the standalone eamxx tests (test-all-scream). -## Nightly overview, CDash ## +## Nightly overview, CDash Our nightly testing is much more extensive than the CI testing. You can see our dashboard here under the section "E3SM_SCREAM": -https://my.cdash.org/index.php?project=E3SM + We run a variety of CIME test suites and standalone testing on a number of platforms. We even do some performance testing on frontier. diff --git a/components/eamxx/docs/developer/cime_testing.md b/components/eamxx/docs/developer/cime_testing.md index 71233a245b4b..667488960f64 100644 --- a/components/eamxx/docs/developer/cime_testing.md +++ b/components/eamxx/docs/developer/cime_testing.md @@ -4,33 +4,37 @@ Full model system testing of eamxx is done through CIME test cases (much like the rest of E3SM). We offer a number of test suites, including: + * e3sm_scream_v0: Test the full set of V0 (pre-C++) tests * e3sm_scream_v1: Test the full set of V1 (C++) tests * e3sm_scream_v1_at: A smaller and quicker set of tests for autotesting * e3sm_scream_hires: A small number of bigger, longer-running tests to measure performance Example for running a suite: -``` -% cd $repo/cime/scripts -% ./create_test e3sm_scream_v1_at --wait + +```shell +cd $repo/cime/scripts +./create_test e3sm_scream_v1_at --wait ``` Example for running a single test case: -``` -% cd $repo/cime/scripts -% ./create_test SMS.ne4_ne4.F2010-SCREAMv1 --wait + +```shell +cd $repo/cime/scripts +./create_test SMS.ne4_ne4.F2010-SCREAMv1 --wait ``` There are many behavioral tweaks you can make to a test case, like changing the run length, test type, etc. Most of this is not specific to eamxx and works for any CIME case. This generic stuff is well-documentated here: -http://esmci.github.io/cime/versions/master/html/users_guide/testing.html + When it comes to things specific to eamxx, you have grids, compsets, and testmods. Common EAMxx grids are: + * ne4_ne4 (low resolution) * ne4pg2_ne4pg2 (low resolution with phys grid) * ne30_ne30 (med resolution) @@ -38,9 +42,10 @@ Common EAMxx grids are: * ne1024pg2_ne1024pg2 (ultra high with phys grid) More grid info can be found here: -https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/933986549/ATM+Grid+Resolution+Summary + Common EAMxx compsets are: + * F2010-SCREAM-LR: V0 low res compset with eamxx V0 atmosphere * F2010-SCREAMv1: V1 standard compset with eamxx V1 atmosphere * FIOP-SCREAMv1-DP: V1 with dpxx (doubly-periodic lateral boundary condition in C++) @@ -50,10 +55,14 @@ Full info on supported compsets can be found by looking at this file: `$scream_repo/components/eamxx/cime_config/config_compsets.xml` Common EAMxx testmods are: -* small_kernels: Enable smaller-granularity kernels, can improve performance on some systems -* scream-output-preset-[1-6]: Our 6 output presets. These turn some combination of our three output streams (phys_dyn, phys, and diags), + +* small_kernels: Enable smaller-granularity kernels, + can improve performance on some systems +* scream-output-preset-[1-6]: Our 6 output presets. + These turn some combination of our three output streams + (phys_dyn, phys, and diags), various remaps, etc. -* bfbhash: Turns on bit-for-bit hash output: https://acme-climate.atlassian.net/wiki/spaces/NGDNA/pages/3831923056/EAMxx+BFB+hashing +* bfbhash: Turns on bit-for-bit hash output: More info on running EAMxx can be found here: -https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/3386015745/How+To+Run+EAMxx+SCREAMv1 + diff --git a/components/eamxx/docs/developer/field.md b/components/eamxx/docs/developer/field.md index 4170b28ac2b2..8df83440a2fd 100644 --- a/components/eamxx/docs/developer/field.md +++ b/components/eamxx/docs/developer/field.md @@ -1,45 +1,58 @@ -## Field +# Field -In EAMxx, a `Field` is a data structure holding two things: pointers to the data and pointers to metadata. -Both the data and metadata are stored in `std::shared_ptr` instances, to ensure consistency across all copies -of the field. This allows for fast shallow copy semantic for this class. +In EAMxx, a `Field` is a data structure holding two things: pointers to the +data and pointers to metadata. Both the data and metadata are stored in +`std::shared_ptr` instances, to ensure consistency across all copies of +the field. This allows for fast shallow copy semantic for this class. -The data is stored on both CPU and device memory (these may be the same, depending on the Kokkos -backend). In EAMxx, we always assume and guarantee that the device data is up to date. That implies that the data -must be explicitly synced to host before using it on host, and explicitly synced to device after host manipulation, -in order to ensure correctness. In order to access the data, users must use the `get_view`/'get_strided_view' methods, -which takes two template arguments: the data type, and an enum specifying whether CPU or device data is needed. -The data type is used to reinterpret the generic pointer stored inside to a view of the correct scalar type and layout. -It is a possibly const-qualified type, and if the field was marked as "read-only", the method ensures that the -provided data type is const. A read-only field can be created via the `getConst` method, which returns a shallow -copy of the field, but marked as read-only. The enum specifying host or device data is optional, with device being the default. +The data is stored on both CPU and device memory (these may be the same, +depending on the Kokkos backend). In EAMxx, we always assume and guarantee +that the device data is up to date. That implies that the data must be +explicitly synced to host before using it on host, and explicitly synced +to device after host manipulation, in order to ensure correctness. +In order to access the data, users must use the `get_view`/ +`get_strided_view` methods, which takes two template arguments: +the data type, and an enum specifying whether CPU or device data is needed. +The data type is used to reinterpret the generic pointer stored inside +to a view of the correct scalar type and layout. It is a possibly +const-qualified type, and if the field was marked as "read-only", +the method ensures that the provided data type is const. A read-only field +can be created via the `getConst` method, which returns a shallow copy of +the field, but marked as read-only. The enum specifying host or device data +is optional, with device being the default. -The metadata is a collection of information on the field, such as name, layout, units, allocation size, and more. -Part of the metadata is immutable after creation (e.g., name, units, or layout), while some metadata can be -partially or completely modified. The metadata is contained in the `FieldHeader` data structure, which contains -four parts: +The metadata is a collection of information on the field, such as name, layout, units, +allocation size, and more. Part of the metadata is immutable after creation (e.g., +name, units, or layout), while some metadata can be partially or completely modified. +The metadata is contained in the `FieldHeader` data structure, which contains four +parts: -* `FieldIdentifier`: stores the field's name, layout, units, data type, and name of the grid where it's defined. - These information are condensed in a single string, that can be used to uniquely identify a field, - allowing to distinguish between different version of the same field. The layout is stored in the `FieldLayout` - data structure, which includes: - * the field tags: stored as a `std::vector`, they give context to the field's extents. - * the field dims: stored both as a `std::vector`, as well as a 1d `Kokkos::View`. -* `FieldTracking`: stores information on the usage of the field, as well as its possible connections to other - fields. In particular, the tracked items are: - * the field time stamp: the time stamp when the field was last updated. - * the field accumulation start time: used for fields that are accumulated over several time steps - (or time step subcycles). For instance, it allows to reconstruct fluxes from raw accumulations. - * the providers/customers: lists of atmosphere processes (see below) that respectively require/compute - the field in their calculations. - * the field groups: a list of field groups that this field belongs too. Field groups are used to access - a group of fields without explicit prior knowledge about the number and/or names of the fields. -* `FieldAllocProp`: stores information about the allocation. While the field is not yet allocated, users can - request special allocations for the field, for instance to accommodate packing (for SIMD), which may - require padding. Upon allocation, this information is then used by the Field structure to extract the - actual data, wrapped in a properly shaped `Kokkos::View`. The alloc props are also responsible of tracking - additional information in case the field is a "slice" of a higher-dimensional one, a fact that can affect - how the data is accessed. -* Extra data: stored as a `std::map`, allows to catch any metadata that does not fit - in the above structures. This is a last resort structure, intended to accommodate the most peculiar - corner cases, and should be used sparingly. +* `FieldIdentifier`: stores the field's name, layout, units, data type, + and name of the grid where it's defined. These information are condensed + in a single string, that can be used to uniquely identify a field, allowing + to distinguish between different version of the same field. + The layout is stored in the `FieldLayout` data structure, which includes: + * the field tags: stored as a `std::vector`, they give context to the + field's extents. + * the field dims: stored both as a `std::vector`, as well as a 1d `Kokkos::View`. +* `FieldTracking`: stores information on the usage of the field, as well as its + possible connections to other fields. In particular, the tracked items are: + * the field time stamp: the time stamp when the field was last updated. + * the field accumulation start time: used for fields that are accumulated over + several time steps (or time step subcycles). For instance, it allows to + reconstruct fluxes from raw accumulations. + * the providers/customers: lists of atmosphere processes (see below) that + respectively require/compute the field in their calculations. + * the field groups: a list of field groups that this field belongs too. Field groups + are used to access a group of fields without explicit prior knowledge about the + number and/or names of the fields. +* `FieldAllocProp`: stores information about the allocation. While the field is not + yet allocated, users can request special allocations for the field, for instance + to accommodate packing (for SIMD), which may require padding. Upon allocation, + this information is then used by the Field structure to extract the actual data, + wrapped in a properly shaped `Kokkos::View`. The alloc props are also + responsible of tracking additional information in case the field is a "slice" of + a higher-dimensional one, a fact that can affect how the data is accessed. +* Extra data: stored as a `std::map`, allows to catch any + metadata that does not fit in the above structures. This is a last resort structure, + intended to accommodate the most peculiar corner cases, and should be used sparingly. diff --git a/components/eamxx/docs/developer/grid.md b/components/eamxx/docs/developer/grid.md index 8a61b97e0795..b4e1a1c8c033 100644 --- a/components/eamxx/docs/developer/grid.md +++ b/components/eamxx/docs/developer/grid.md @@ -1,22 +1,29 @@ -## Grids and Remappers +# Grids and Remappers -In EAMxx, the `AbstractGrid` is an interface used to access information regarding the horizontal and vertical -discretization. The most important information that the grid stores is: +In EAMxx, the `AbstractGrid` is an interface used to access information regarding +the horizontal and vertical discretization. The most important information that +the grid stores is: -* the number of local/global DOFs: these are the degrees of freedom of the horizontal grid only. Here, - local/global refers to the MPI partitioning. -* the DOFs global IDs (GIDs): a list of GIDs of the DOFs on the current MPI rank, stored as a Field -* the local IDs (LIDs) to index list: this list maps the LID of a DOF (that is, the position of the DOF - in the GID list) to a "native" indexing system for that DOF. For instance, a `PointGrid` (a class derived from - `AbstractGrid`) is a simple collection of points, so the "native" indexing system coincides with the LIDs. - However, for a `SEGrid` (a derived class, for spectral element grids), the "native" indexing is a triplet - `(ielem,igp,jgp)`, specifying the element index, and the two indices of the Gauss point within the element. -* geometry data: stored as a `std::map`, this represent any data that is intrinsically - linked to the grid (either along the horizontal or vertical direction), such as lat/lon coordinates, - vertical coordinates, area associated with the DOF. +* the number of local/global DOFs: these are the degrees of freedom of the + horizontal grid only. Here, local/global refers to the MPI partitioning. +* the DOFs global IDs (GIDs): a list of GIDs of the DOFs on the current MPI rank, + stored as a Field +* the local IDs (LIDs) to index list: this list maps the LID of a DOF (that is, + the position of the DOF in the GID list) to a "native" indexing system for that + DOF. For instance, a `PointGrid` (a class derived from `AbstractGrid`) is a + simple collection of points, so the "native" indexing system coincides with the + LIDs. However, for a `SEGrid` (a derived class, for spectral element grids), + the "native" indexing is a triplet `(ielem,igp,jgp)`, specifying the element + index, and the two indices of the Gauss point within the element. +* geometry data: stored as a `std::map`, this represent any + data that is intrinsically linked to the grid (either along the horizontal or + vertical direction), such as lat/lon coordinates, vertical coordinates, area + associated with the DOF. -Grids can also be used to retrieve the layout of a 2d/3d scalar/vector field, which allows certain downstream -classes to perform certain operations without assuming anything on the horizontal grid. +Grids can also be used to retrieve the layout of a 2d/3d scalar/vector field, +which allows certain downstream classes to perform certain operations without +assuming anything on the horizontal grid. -In general, grid objects are passed around the different parts of EAMxx as const objects (read-only). -The internal data can only be modified during construction, which usually is handled by a `GridsManager` object. +In general, grid objects are passed around the different parts of EAMxx as const +objects (read-only). The internal data can only be modified during construction, +which usually is handled by a `GridsManager` object. diff --git a/components/eamxx/docs/developer/index.md b/components/eamxx/docs/developer/index.md index 2d47bab65fe3..69673b12ebd5 100644 --- a/components/eamxx/docs/developer/index.md +++ b/components/eamxx/docs/developer/index.md @@ -1,3 +1 @@ # SCREAM Developer Guide - - diff --git a/components/eamxx/docs/developer/io.md b/components/eamxx/docs/developer/io.md index caf237010a33..0a4c7b2d8323 100644 --- a/components/eamxx/docs/developer/io.md +++ b/components/eamxx/docs/developer/io.md @@ -1,5 +1,5 @@ # Input-Output -In EAMxx, I/O is handled through the SCORPIO library, currently a submodule of E3SM. -The `scream_io` library within eamxx allows to interface the EAMxx infrastructure classes -with the SCORPIO library. +In EAMxx, I/O is handled through the SCORPIO library, currently a submodule of +E3SM. The `scream_io` library within eamxx allows to interface the EAMxx +infrastructure classes with the SCORPIO library. diff --git a/components/eamxx/docs/developer/kokkos_ekat.md b/components/eamxx/docs/developer/kokkos_ekat.md index 45827a11f839..2432290a67a0 100644 --- a/components/eamxx/docs/developer/kokkos_ekat.md +++ b/components/eamxx/docs/developer/kokkos_ekat.md @@ -2,99 +2,163 @@ ## Kokkos -EAMxx uses Kokkos for performance portable abstractions for parallel execution of code and data management to various HPC platforms, including OpenMP, Cuda, HIP, and SYCL. Here we give a brief overview of the important concepts for understanding Kokkos in EAMxx. For a more in depth description, see the [Kokkos wiki](https://kokkos.org/kokkos-core-wiki). +EAMxx uses Kokkos for performance portable abstractions for parallel execution +of code and data management to various HPC platforms, including OpenMP, Cuda, +HIP, and SYCL. Here we give a brief overview of the important concepts for +understanding Kokkos in EAMxx. For a more in depth description, see the +[Kokkos wiki](https://kokkos.org/kokkos-core-wiki). ### Kokkos::Device -`Kokkos::Device` is a struct which contain the type definitions for two main Kokkos concepts: execution space (`Kokkos::Device::execution_space`), the place on-node where parallel operations (like for-loops, reductions, etc.) are executed, and the memory space (`Kokkos::Device::memory_space`), the memory location on-node where data is stored. Given your machine architecture, Kokkos defines a default "device" space, given by -``` -Kokkos::Device -``` -where all performance critical code should be executed (e.g., on an NVIDIA machine, this device would be the GPU accelerators) and a default "host" space, given by +`Kokkos::Device` is a struct which contain the type definitions for two main +Kokkos concepts: execution space (`Kokkos::Device::execution_space`), the place +on-node where parallel operations (like for-loops, reductions, etc.) are +executed, and the memory space (`Kokkos::Device::memory_space`), the memory +location on-node where data is stored. Given your machine architecture, Kokkos +defines a default "device" space, given by + +```cpp +Kokkos::Device ``` -Kokkos::Device + +where all performance critical code should be executed (e.g., on an NVIDIA +machine, this device would be the GPU accelerators) and a default "host" space, +given by + +```c++ +Kokkos::Device ``` -where data can be accessed by the CPU cores and is necessary for I/O interfacing, for example. Currently, these default spaces are the ones used by EAMxx. On CPU-only machines, host and device represent the same space. + +where data can be accessed by the CPU cores and is necessary for I/O +interfacing, for example. Currently, these default spaces are the ones used by +EAMxx. On CPU-only machines, host and device represent the same space. ### Kokkos Views -The main data struct provided by Kokkos used in EAMxx in the `Kokkos::View`. This is a multi-dimensional data array that can live on either device or host memory space. These Views are necessary when running on GPU architectures as data structures like `std::vector` and `std::array` will be unavailable on device. +The main data struct provided by Kokkos used in EAMxx in the `Kokkos::View`. +This is a multi-dimensional data array that can live on either device or host +memory space. These Views are necessary when running on GPU architectures as +data structures like `std::vector` and `std::array` will be unavailable on +device. -Views are constructed in EAMxx most commonly with the following template and input arguments -``` -Kokkos::View(const std::string& label, int dim0, int dim1, ...) +Views are constructed in EAMxx most commonly with the following template and +input arguments + +```cpp +Kokkos::View(const std::string& label, + int dim0, int dim1, ...) ``` + where - - `DataType`: scalar type of the view, given as `ScalarType`+`*`(x's number of run-time dimensions). E.g., a 2D view of doubles will have `DataType = double**`. There is also an ability to define compile-time dimensions by using `[]`, see [Kokkos wiki section on views](https://kokkos.org/kokkos-core-wiki/API/core/view/view.html). - - `LayoutType`: mapping of indices into the underlying 1D memory storage. Types are: - - `LayoutRight` (used in EAMxx): strides increase from the right most to the left most dimension, right-most dimension is contiguous - - `LayoutLeft`: strides increase from the left most to the right most dimension, left-most dimension is contiguous - - `LayoutStride`: strides can be arbitrary for each dimension - - `DeviceType`: provides space where data live, defaults to the default device +- `DataType`: scalar type of the view, given as `ScalarType`+`*`(x's number of + run-time dimensions). E.g., a 2D view of doubles will have `DataType = + double**`. There is also an ability to define compile-time dimensions by + using `[]`, see [Kokkos wiki section on views]( + wiki/API/core/view/view.html). +- `LayoutType`: mapping of indices into the underlying 1D memory storage. Types + are: + - `LayoutRight` (used in EAMxx): strides increase from the right most to the + left most dimension, right-most dimension is contiguous + - `LayoutLeft`: strides increase from the left most to the right most + dimension, left-most dimension is contiguous + - `LayoutStride`: strides can be arbitrary for each dimension +- `DeviceType`: provides space where data live, defaults to the default device -The following example defines a view "temperature" which has dimensions columns and levels: -``` -Kokkos::View temperature("temperature", ncols, nlevs); +The following example defines a view "temperature" which has dimensions columns +and levels: + +```cpp +Kokkos::View temperature( + "temperature", ncols, nlevs); ``` ### Deep Copy -Kokkos provides `Kokkos::deep_copy(dst, src)` which copies data between views of the same dimensions, or a scalar values into a view. Common uses -``` +Kokkos provides `Kokkos::deep_copy(dst, src)` which copies data between views +of the same dimensions, or a scalar values into a view. Common uses + +```cpp Kokkos::deep_copy(view0, view1); // Copy all data from view1 into view0 Kokkos::deep_copy(view0, 5); // Set all values of view0 to 5 ``` -As seen in the next section, we can use `deep_copy()` to copy data between host and device. + +As seen in the next section, we can use `deep_copy()` to copy data between host +and device. ### Mirror Views -We will often need to have memory allocation the resides on device (for computation), and then need that identical data on host (say, for output). Kokkos has a concept of mirror views, where data can be copied from host to device and vice versa. +We will often need to have memory allocation the resides on device (for +computation), and then need that identical data on host (say, for output). +Kokkos has a concept of mirror views, where data can be copied from host to +device and vice versa. Here is an example using the device view `temperature` from above -``` -// Allocate view on host that exactly mirrors the size of layout of the device view + +```cpp +// Allocate view on host that exactly mirrors the size of layout of the device +view auto host_temperature = Kokkos::create_mirror_view(temperature); // Copy all data from device to host Kokkos::deep_copy(host_temperature, temperature); ``` + Kokkos also offers an all-in-one option -``` + +```cpp // Note: must hand the host device instance as first argument -auto host_temperature = Kokkos::create_mirror_view_and_copy(Kokkos::DefaultHostDevice(), temperature); +auto host_temperature = Kokkos::create_mirror_view_and_copy( + Kokkos::DefaultHostDevice(), temperature); ``` ### Parallel Execution -The most basic parallel execution pattern used by EAMxx is the `Kokkos::parallel_for` which defines a for-loop with completely independent iterations. The `parallel_for` takes in an optional label for debugging, an execution policy, which defines a range and location (host or device) for the code to be run, and a lambda describing the body of code to be executed. The following are execution policies used in EAMxx - - - `int count`: 1D iteration range `[0, count)` - - `RangePolicy(int beg, int end)`: 1D iteration range for indices `[beg, end)` - - `MDRangePolicy>(int[N] beg, int[N] end)`: multi-dimensional iteration range `[beg, end)` - - `TeamPolicy(int league_size, int team_size, int vector_size)`: 1D iteration over `league_size`, assigned to thread teams of size `team_size`, each with `vector_size` vector lanes. Both `team_size` and `vector_size` can be given `Kokkos::AUTO` as input for Kokkos to automatically compute. +The most basic parallel execution pattern used by EAMxx is the +`Kokkos::parallel_for` which defines a for-loop with completely independent +iterations. The `parallel_for` takes in an optional label for debugging, an +execution policy, which defines a range and location (host or device) for the +code to be run, and a lambda describing the body of code to be executed. The +following are execution policies used in EAMxx + +- `int count`: 1D iteration range `[0, count)` +- `RangePolicy(int beg, int end)`: 1D iteration range for indices + `[beg, end)` +- `MDRangePolicy>(int[N] beg, int[N] end)`: multi- + dimensional iteration range `[beg, end)` +- `TeamPolicy(int league_size, int team_size, int vector_size)`: 1D + iteration over `league_size`, assigned to thread teams of size `team_size`, + each with `vector_size` vector lanes. Both `team_size` and `vector_size` can + be given `Kokkos::AUTO` as input for Kokkos to automatically compute. If no `ExecSpace` template is given, the default execution space is used. -For lambda capture, use `KOKKOS_LAMBDA` macro which sets capture automatically based on architecture. +For lambda capture, use `KOKKOS_LAMBDA` macro which sets capture automatically +based on architecture. Example using `RangePolicy` to initialize a view -``` -Kokkos::View temperature("temperature", ncols, nlevs); + +```cpp +Kokkos::View temperature("temperature", ncols, + nlevs); Kokkos::parallel_for("Init_temp", - Kokkos::RangePolicy(0, ncols*nlevs), - KOKKOS_LAMBDA (const int idx) { + Kokkos::RangePolicy(0, ncols*nlevs), + KOKKOS_LAMBDA (const int idx) { int icol = idx/nlevs; int ilev = idx%nlevs; temperature(icol, ilev) = 0; }); ``` + Same example with `TeamPolicy` -``` + +```cpp Kokkos::parallel_for("Init_temp", - Kokkos::TeamPolicy(ncols*nlevs, Kokkos::AUTO, Kokkos::AUTO), - KOKKOS_LAMBDA (const TeamPolicy::member_type& team) { + Kokkos::TeamPolicy(ncols*nlevs, Kokkos::AUTO, Kokkos::AUTO), + KOKKOS_LAMBDA (const TeamPolicy::member_type& team) { // league_rank() gives the index for this team int icol = team.league_rank()/nlevs; int ilev = team.league_rank()%nlevs; @@ -105,32 +169,39 @@ Kokkos::parallel_for("Init_temp", ### Hierarchical Parallelism -Using `TeamPolicy`, we can have up to three nested levels of parallelism: team parallelism, thread parallelism, vector parallelism. These nested policies can be called within the lambda body using the following execution policies +Using `TeamPolicy`, we can have up to three nested levels of parallelism: team +parallelism, thread parallelism, vector parallelism. These nested policies can +be called within the lambda body using the following execution policies - - `TeamThreadRange(team, begin, end)`: execute over threads of a team - - `TeamVectorRange(team, begin, end)`: execute over threads and vector lanes of a team - - `ThreadVectorRange(team, begin, end)`: execute over vector lanes of a thread +- `TeamThreadRange(team, begin, end)`: execute over threads of a team +- `TeamVectorRange(team, begin, end)`: execute over threads and vector lanes of + a team +- `ThreadVectorRange(team, begin, end)`: execute over vector lanes of a thread An example of using these policies -``` + +```cpp Kokkos::View Q("tracers", ncols, ntracers, nlevs); Kokkos::parallel_for(Kokkos::TeamPolicy(ncols, Kokkos::AUTO), - KOKKOS_LAMBDA (TeamPolicy::member_type& team) { + KOKKOS_LAMBDA (TeamPolicy::member_type& team) { int icol = team.league_rank(); Kokkos::parallel_for(Kokkos::TeamVectorRange(team, nlevs), [&](int ilev) { - temperature(icol, ilev) = 0; + temperature(icol, ilev) = 0; }); Kokkos::parallel_for(Kokkos::TeamThreadRange(team, nlevs), [&](int ilev) { - Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, ntracers), [&](int iq) { - Q(icol, iq, ilev) = 0; - }); + Kokkos::parallel_for(Kokkos::ThreadVectorRange(team, ntracers), [&](int iq) { + Q(icol, iq, ilev) = 0; + }); }); }); ``` -IMPORTANT! Nested policies cannot be used in arbitrary order. `ThreadVectorRange` must be used inside a `TeamThreadRange`, and `TeamVectorRange` must be the only level of nested parallelism. -``` +IMPORTANT! Nested policies cannot be used in arbitrary order. `ThreadVectorRange` +must be used inside a `TeamThreadRange`, and `TeamVectorRange` must be the only +level of nested parallelism. + +```cpp Kokkos::parallel_for(TeamPolicy(...), ... { // OK Kokkos::parallel_for(TeamThreadRange, ... { @@ -139,9 +210,9 @@ Kokkos::parallel_for(TeamPolicy(...), ... { // OK Kokkos::parallel_for(TeamThreadRange, ... { - Kokkos::parallel_for(ThreadVectorRange, ... { + Kokkos::parallel_for(ThreadVectorRange, ... { - }); + }); }); // OK @@ -156,13 +227,15 @@ Kokkos::parallel_for(TeamPolicy(...), ... { // WRONG, a TeamVectorRange must be the only nested level Kokkos::parallel_for(TeamVectorRange, ...{ - Kokkos::parallel_for(ThreadVectorRange, ... { + Kokkos::parallel_for(ThreadVectorRange, ... { - }); + }); }); }); ``` -Using these incorrectly can be very tricky to debug as the code almost certainly will _not_ error out, but race conditions will exist among threads. + +Using these incorrectly can be very tricky to debug as the code almost certainly +will _not_ error out, but race conditions will exist among threads. ## EKAT @@ -175,6 +248,3 @@ Using these incorrectly can be very tricky to debug as the code almost certainly ### Scratch Memory: WorspaceManager ### Algorithms - - - diff --git a/components/eamxx/docs/developer/managers.md b/components/eamxx/docs/developer/managers.md index 676449a21845..fa98c8b1d720 100644 --- a/components/eamxx/docs/developer/managers.md +++ b/components/eamxx/docs/developer/managers.md @@ -1 +1 @@ -## FieldManager and GridsManager +# FieldManager and GridsManager diff --git a/components/eamxx/docs/developer/processes.md b/components/eamxx/docs/developer/processes.md index 9ad556a31835..adb90e2dfbcd 100644 --- a/components/eamxx/docs/developer/processes.md +++ b/components/eamxx/docs/developer/processes.md @@ -1,59 +1,77 @@ # Atmospheric Processes -In EAMxx, `AtmosphereProcess` (AP) is an abstract class representing a portion of the atmosphere timestep algorithm. -In simple terms, an AP is an object that given certain input fields performs some calculations to compute -some output fields. The concrete AP classes allow to create a buffer layer between particular packages (e.g., -dynamics dycore, physics parametrizations) and the atmosphere driver (AD), allowing separation of concerns, -so that the AD does not need to know details about the package, and the package does not need to know about -the EAMxx infrastructure. - -To enhance this separation of concerns, EAMxx implements two more classes for handling APs: - -- the concrete class `AtmosphereProcessGroup` (APG), which allows to group together a set of AP's, which can be seen from outside as a single process; -- the `AtmosphereProcessFactory` class, which allows an APG to create its internal processes without any knowledge of -what they are. - -This infrastructure allows the AD to view the whole atmosphere as a single APG, and to be completely agnostic to -what processes are run, and in which order. This design allows to have a code base that is cleaner, self-container, -and easy to test via a battery of targeted unit tests. - -In EAMxx, we already have a few concrete AP's, interfacing the AD to the Hommexx non-hydrostatic dycore as well as -some physics parametrizations (P3, SHOC, RRMTPG, etc). In the next section we describe the interfaces of an AP class, -and we show an example of how to write a new concrete AP class. +In EAMxx, `AtmosphereProcess` (AP) is an abstract class representing a portion +of the atmosphere timestep algorithm. In simple terms, an AP is an object that +given certain input fields performs some calculations to compute some output +fields. The concrete AP classes allow to create a buffer layer between +particular packages (e.g., dynamics dycore, physics parametrizations) and the +atmosphere driver (AD), allowing separation of concerns, so that the AD does +not need to know details about the package, and the package does not need to +know about the EAMxx infrastructure. + +To enhance this separation of concerns, EAMxx implements two more classes for +handling APs: + +- the concrete class `AtmosphereProcessGroup` (APG), which allows to group + together a set of AP's, which can be seen from outside as a single process; +- the `AtmosphereProcessFactory` class, which allows an APG to create its + internal processes without any knowledge of what they are. + +This infrastructure allows the AD to view the whole atmosphere as a single APG, +and to be completely agnostic to what processes are run, and in which order. +This design allows to have a code base that is cleaner, self-container, and +easy to test via a battery of targeted unit tests. + +In EAMxx, we already have a few concrete AP's, interfacing the AD to the +Hommexx non-hydrostatic dycore as well as some physics parametrizations (P3, +SHOC, RRMTPG, etc). In the next section we describe the interfaces of an AP +class, and we show an example of how to write a new concrete AP class. ## Atmosphere process interfaces An AP has several interfaces, which can be grouped into three categories: - - initialization: these interfaces are used to create the AP, as well as to initialize internal data structures; - - run: these interfaces are used to make the AP compute its output fields from its input fields; - - finalization: these interfaces are used to perform any clean up operation (e.g., release files) before the AP is - destroyed. - -Among the above, the initialization sequence is the most complex, and conists of several steps: - - - The AD creates the APG corresponding to the whole atmosphere. As mentioned above, this phase will make use of a factory, - which allows the AD to be agnostic to what is actually in the group. All AP's can start performing any initialization - work that they can, but at this point they are limited to use only an MPI communicator as well as a list of runtime - parameters (which were previously read from an input file). - - The AD passes a `GridsManager` to the AP's, so that they can get information about the grids they need. At this point, - all AP's have all the information they need to establish the layout of the input and output fields they need, - and can store a list of these "requests" - - After creating all fields (based on AP's requests), the AD passes a copy of each input and output field to - the AP's. These fields will be divided in "required" and "computed", which differ in that the former are only - passed to the AP's as 'read-only' fields (see the [field](field.md#Field) documentation for more details) - - The AP's are queried for how much scratch memory they may need at run time. After all AP's communicate their needs, - the AD will provide a pointer to scratch memory to the AP's. This is memory that can be used to initialize - temporary views/fields or other internal data structures. All AP's are given the same pointer, which means no - data persistence should be expected at run time between one timestep and the next. - - The AD calls the 'initialize' method on each AP. At this point, all fields are set, and AP's can complete any - remaining initialization task - -While the base AP class provides an (empty) implementation for some methods, in case derived classes do not need a -feature, some methods are purely virtual, and concrete classes will have to override them. Looking at existing -concrete AP implementations is a good way to have a first idea of what a new AP class needs to implement. Here, -we show go over the possible implementation of these methods in a hypothetical AP class. The header file may -look something like this +- initialization: these interfaces are used to create the AP, as well as to + initialize internal data structures; +- run: these interfaces are used to make the AP compute its output fields from + its input fields; +- finalization: these interfaces are used to perform any clean up operation + (e.g., release files) before the AP is destroyed. + +Among the above, the initialization sequence is the most complex, and consists +of several steps: + +- The AD creates the APG corresponding to the whole atmosphere. As mentioned + above, this phase will make use of a factory, which allows the AD to be + agnostic to what is actually in the group. All AP's can start performing any + initialization work that they can, but at this point they are limited to use + only an MPI communicator as well as a list of runtime parameters (which were + previously read from an input file). +- The AD passes a `GridsManager` to the AP's, so that they can get information + about the grids they need. At this point, all AP's have all the information + they need to establish the layout of the input and output fields they need, + and can store a list of these "requests" +- After creating all fields (based on AP's requests), the AD passes a copy of + each input and output field to the AP's. These fields will be divided in + "required" and "computed", which differ in that the former are only passed + to the AP's as 'read-only' fields (see the [field](field.md#Field) + documentation for more details) +- The AP's are queried for how much scratch memory they may need at run time. + After all AP's communicate their needs, the AD will provide a pointer to + scratch memory to the AP's. This is memory that can be used to initialize + temporary views/fields or other internal data structures. All AP's are given + the same pointer, which means no data persistence should be expected at run + time between one timestep and the next. +- The AD calls the 'initialize' method on each AP. At this point, all fields + are set, and AP's can complete any remaining initialization task + +While the base AP class provides an (empty) implementation for some methods, in +case derived classes do not need a feature, some methods are purely virtual, +and concrete classes will have to override them. Looking at existing concrete +AP implementations is a good way to have a first idea of what a new AP class +needs to implement. Here, we show go over the possible implementation of these +methods in a hypothetical AP class. The header file may look something like +this ```c++ #include @@ -86,21 +104,26 @@ protected: bool m_has_blah; }; ``` + A few comments: - - we added two views to the class, which are meant to be used to store intermediate results during calculations at -runtime; - - there are other methods that the class can override (such as additional operations when the AD sets a field in the - AP), but most AP's only need to override only these; - - we strongly encourage to add the keyword `override` when overriding a method; in case of small typos (e.g., missing - a `&` or a `const`, the compiler will be erroring out, since the signature will not match any virtual method in the - base class; - - `findalize_impl` is often empty; unless the AP is managing external resources, everything should be correctly released - during destruction; - - the two methods for buffers can be omitted if the AP does not need any scratch memory (and the default implementation - from the base class will be used). - -Here is a possible implementation of the methods, with some inline comments to explain +- we added two views to the class, which are meant to be used to store + intermediate results during calculations at runtime; +- there are other methods that the class can override (such as additional + operations when the AD sets a field in the AP), but most AP's only need to + override only these; +- we strongly encourage to add the keyword `override` when overriding a method; + in case of small typos (e.g., missing a `&` or a `const`, the compiler will + be erroring out, since the signature will not match any virtual method in the + base class; +- `finalize_impl` is often empty; unless the AP is managing external resources, + everything should be correctly released during destruction; +- the two methods for buffers can be omitted if the AP does not need any + scratch memory (and the default implementation from the base class will be + used). + +Here is a possible implementation of the methods, with some inline comments to +explain ```c++ MyProcess::MyProcess (const ekat::Comm& comm, const ekat::ParameterList& pl) diff --git a/components/eamxx/docs/developer/source_tree.md b/components/eamxx/docs/developer/source_tree.md index 15c018cc8858..ed8270db635e 100644 --- a/components/eamxx/docs/developer/source_tree.md +++ b/components/eamxx/docs/developer/source_tree.md @@ -56,4 +56,3 @@ You'll also see some other files in the `src/` directory itself, such as + `scream_config.h.in`: A template for generating a C++ header file with EAMxx configuration information. - diff --git a/components/eamxx/docs/developer/standalone_testing.md b/components/eamxx/docs/developer/standalone_testing.md index e2bb5d625562..633dcc34dc1d 100644 --- a/components/eamxx/docs/developer/standalone_testing.md +++ b/components/eamxx/docs/developer/standalone_testing.md @@ -27,26 +27,30 @@ be made known to EAMxx by editing the eamxx/scripts/machines_specs.py files. There are some instructions on what to do at the top of this file. `test-all-scream` has a good help dump -``` -% cd $scream_repo/components/eamxx -% ./scripts/test-all-scream -h + +```shell +cd $scream_repo/components/eamxx +./scripts/test-all-scream -h ``` If you are unsure of the cmake configuration for you development cycle, one trick you can use is to run `test-all-scream` for the `dbg` test and just copy the cmake command it prints (then ctrl-C the process). -``` -% cd $scream_repo/components/eamxx -% ./scripts/test-all-scream -t dbg -m $machine -* wait for a few seconds* -* Ctrl-C * -* Copy the contents of DCMAKE_COMMAND that was passed to ctest * -* Add "cmake" to beginning of contents and path to eamxx at the end. * + +```shell +cd $scream_repo/components/eamxx +./scripts/test-all-scream -t dbg -m $machine +# wait for a few seconds* +# Ctrl-C * +# Copy the contents of DCMAKE_COMMAND that was passed to ctest * +# Add "cmake" to beginning of contents and path to eamxx at the end. * ``` Considerations for using `test-all-scream`: + * Your machine must be known to our scripts, see above. -* If you try to run commands by-hand (outside of test-all-scream; cmake, make, ctest, etc), you'll need to remember to +* If you try to run commands by-hand (outside of test-all-scream; + cmake, make, ctest, etc), you'll need to remember to load the scream-env into your shell, which can be done like this: `cd eamxx/scripts; eval $(./scripts/scream-env-cmd $machine)` * test-all-scream expects to be run from a compute node if you @@ -63,7 +67,7 @@ Considerations for using `test-all-scream`: Before running the tests, generate a baseline file: -``` +```shell cd $RUN_ROOT_DIR make baseline ``` @@ -75,7 +79,7 @@ path has been provided. To run all of SCREAM's tests, make sure you're in `$RUN_ROOT_DIR` and type -``` +```shell ctest -VV ``` @@ -84,7 +88,7 @@ This runs everything and reports results in an extra-verbose (`-VV`) manner. You can also run subsets of the SCREAM tests. For example, to run only the P3 regression tests (again, from the `$RUN_ROOT_DIR` directory), use -``` +```shell ctest -R p3_regression ``` @@ -94,13 +98,13 @@ We can create groupings of tests by using **labels**. For example, we have a `driver` label that runs tests for SCREAM's standalone driver. You can see a list of available labels by typing -``` +```shell ctest --print-labels ``` To see which tests are associated with a given label (e.g. `driver`), use -``` +```shell ctest -L driver -N ``` @@ -117,4 +121,3 @@ on the C++/Kokkos implementation, you can invoke any new tests to the function If the reference Fortran implementation changes enough that a new baseline file is required, make sure to let other SCREAM team members know, in order to minimize disruptions. - diff --git a/components/eamxx/docs/developer/style_guide.md b/components/eamxx/docs/developer/style_guide.md index f43678330099..4f6f340cb66e 100644 --- a/components/eamxx/docs/developer/style_guide.md +++ b/components/eamxx/docs/developer/style_guide.md @@ -7,4 +7,3 @@ Here's our style guide. Let the holy wars begin! ## Functions and Methods ## Variables -