Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMA parentheses draft PR #12

Open
wants to merge 78 commits into
base: FMA_rot_sym_ref
Choose a base branch
from

Conversation

Hallberg-NOAA
Copy link
Owner

This PR adds the parentheses that appear to be needed so that FMAs will exhibit rotational symmetry.

alperaltuntas and others added 18 commits November 6, 2023 14:10
* Enhancements for adding land block elimination to NUOPC cap:
 - Add sum_across_PEs_int4_2d to the sum_across_PEs interface
 - Allow mask_table file to be placed in run directory (now,
the first dir that is looked at).

* Enhance NUOPC cap to support MOM_mask_table.

- Determine masked blocks.
- Evenly distribute eliminated cells.
- Fill ESMF gindex array accordingly.
- During Export phase, set fields of eliminated cells to zero.

* set %label in register_netcdf_field and register_netcdf_axis

* first working version of an automated mask table generator

* While determining masked blocks, take reentrancy and tripolar stitch into account

* apply tripolar stitch fix in auto mask_table generation

* add AUTO_IO_LAYOUT_FAC parameter to control IO_LOAYUT when AUTO_MASKTABLE is on

* Miscellaneous auto masking fixes to address reviews:

- Dimensionalize topographic depth variables used to determine cell masks in auto masktable routine.
- Raise error if the user provided PE layout is inconsistent with auto masktable generation.
- Save the masktable parameter description to a string variable to avoid repetition.
- Fix typos, whitespaces, use modern array syntax.

* Disable FPEs in MacOS testing

Due to poor handling of floating point in HDF5 1.14.3, it is currently
not possible to use floating point exceptions (FPEs) whenever this
version is present.

The GitHub Actions CI nodes would randomly select either 1.14.2 or
1.14.3, and would raise an FPE error if 1.14.3 was selected.
Additionally, the homebrew installation does not provide a clean method
for selecting a different version of HDF5.

Thus, for now we disable FPEs in the MacOS testing, and hope to catch
any legitimate FP errors in the Ubuntu version.  We will restore these
tests as soon as this has been fixed in an easily-accessible version of
HDF5.

As part of this PR, I have also moved the FCFLAGS configuration to the
platform specific Actions files, allowing for independent compiler
configuration for each platform.

---------

Co-authored-by: Marshall Ward <[email protected]>
* Fix biharmonic Leith

Biharmonic Leith uses Del omega at is-1 and js-1. This unavoidably requires
u at js-3 and v at is-3, which are unavailable. It also requires Del omega
at Ieq+1 and Jeq+1, which requires v at Ieq+3 and u at Jeq+3, which are
unavailable. This necessitates a halo update.

Fixes several bugs in Leith+E.
- Fixes indexing when computing smoothed vorticity and its gradient
- Crucially, computes `vert_vort_mag` when using Leith+E
- Fixes some logic in the smoothing code
- Other minor indexing fixes

* Leith+E Logic Update

Ah is required at h and q points. The original code computed Ah at
h points, then packed into Ah_h, then applied upper bounds to Ah.
If Ah_h is in the diag_table or if debug is true, then the value of
Ah with upper bounds get packed into Ah_h. Then, at q points the
code unpacks Ah_h. This update makes sure that the upper bound
gets applied to q points, not just h points.

* Leith+E halo updates

The main thing that this commit does is to perform smoothing of u and v
outside of the loop over layers. This swaps nz 2D blocking halo updates
for a single blocking 3D halo update.

* Leith+E smoothing

This commit adds a runtime flag, SMOOTH_AH. If True (default) then
`m_leithy` and `Ah` are both smoothed, which leads to many blocking
communications. If False then these fields are rougher, but there
is less communication.

* Leith+E eliminate pass-var

This commit removes one halo update in Leith+E. To achieve this
requires re-indexing two assignments. The value of Ah and Kh are
computed at h points, then re-used at q points. Without the halo
update it is necessary to offset the assignment at h and q
points, e.g. Kh(I,J) = Kh_h(i+1,j+1,k), to avoid accessing
values that have not been computed.

* Leith+E OBC

Adds code so that Leith+E works with OBC.

* Leith+E eliminate halo update

This commit eliminates one more halo update in Leith+E.

* *Correct rotational symmetry with USE_LEITHY

  This commit revises the smoothing code used when USE_LEITHY = True to give
answers that respect rotational symmetry and it also corrects some horizontal
indexing bugs and problems with the staggering in some halo update and smooth_x9
calls and reduces some loop ranges to their minimal required values.  The
specific changes include:

  1. Corrected a horizontal indexing bug when interpolating Kh_h and Ah_h to
     corner (q) points when USE_LEITHY = True.  These had previously been
     inappropriately copied from the thickness point to the southwest of the
     corner point.  This required symmetric-memory-mode calculations of the
     thickness point viscosities whenever USE_LEITHY is true, but to avoid adding
     complicated logic, the symmetric-memory loop bounds are used for the
     calculation of Kh.

  2. Revised smooth_x9 to give rotationally symmetric answers and split it into
     the two routines smooth_x9_h and smooth_x9_uv to reduce the memory used by
     this routine and reduce the use of optional arguments.

  3. Eliminated 4 unneeded halo update calls, and added error handling for the
     case where Leith options are used with insufficiently wide halos.

  4. Added new integers to indicate the loop ranges over which the viscosities
     and related variables should be calculated, depending on which options are
     active, and then adjusted 91 do-loop extents horizontal_viscosity code to
     reflect the loop ranges over which arrays are actually used.

  5. Added a new 2-d variable for the squared viscosity for smoothing that can
     be used for halo updates and to avoid having a variable with confusingly
     inconsistent dimensions at various points in the code.

  6. Corrected the position arguments on 2 smooth_x9 calls and 4 pass_var calls
     that are used when USE_LEITHY=.true. and SMOOTH_AH=.true.  As previously
     written, these smooth_x9 and pass_var calls would work when in non-symmetric
     memory mode but would give incorrect answers when in symmetric memory mode.

  These revisions change answers when USE_LEITHY is true, but answers are
bitwise identical in all other cases.

---------

Co-authored-by: Robert Hallberg <[email protected]>
Minor version updates to multiple packages used in the generation of the
documentation introduce dependencies needing Sphinx >= 5.0, which breaks
the sphinx extensions we use in documenting the MOM6 APIs. I have added
versions for all the packages needed to keep things working with Sphinx4
for now, but we really do need to find a way to work with the newer
versions.

More pinningaof of requirements for readthedocs

More pinning of requirements for readthedocs
This patch introduces two new macros, BUILD and WORK, to permit
relocation of the build/ and work/ directories.  It also makes the
following smaller changes:

* deps/ is now defined by the DEPS macro.  If unset, deps/ is placed in
  the BUILD directory.

* results/ is moved into WORK.

* Compiler flags which track directories now use $(abspath ...) to
  allow for arbitrary paths.

* GitHub CI paths were adjusted to support these new settings.

* DO_* flags are now used as on/off with ifdef testing, rather than
  checking for `true` values.

* mkmf macros have been removed from the coupled test config.

* The default FMS infra has been changed to FMS2 in all components,
  including the configure.ac outside of .testing.

This work will enable testing of multiple FMS libraries in our CI.
  Refactored set_viscous_BBL to separate out the routines setting the open
interface lengths used for the channel drag, shortening a 1070 line long routine
to 915 lines and reducing the scope of a number of temporary variables.  A
number of logical branch points have been moved outside of the innermost do
loops.  This refactoring will also make it easier to provide alternatives to
some of the solvers that do not use the trigonometric functions to solve for the
roots of a cubic expression and avoiding the issues noted at
NOAA-GFDL/issues/483.  All answers are bitwise identical and public
interfaces are unchanged.
  Added the new routine find_L_open_concave_iterative to use iterative Newton's
method approaches with appropriate limits to solve the cubic equation for the
fractional open face lengths at interfaces that are used by the CHANNEL_DRAG
code.  These solutions are analogous to those given by the previous expressions
that are now in find_L_open_concave_trigonometric, and the two differ at close
to roundoff, but the new method is completely independent of the transcendental
function library, thereby addressing dev/gfdl MOM6 issue mom-ocean#483.  This new routine
is called when the new runtime parameter TRIG_CHANNEL_DRAG_WIDTHS is set to
false, but by default the previous answers are recovered.  By default all
answers are bitwise identical, but there is a new runtime parameter in some
MOM_parameter_doc files.
  Added the new debugging or testing subroutine test_L_open_concave along with
extra calls when DEBUG = True that can be used to demonstrate that the iterative
solver in find_L_open_concave_iterative is substantially more accurate but
mathematically equivalent to the solver in find_L_open_concave_trigonometric.
This extra code is only called in debugging mode, and it probably should be
deleted in a separate commit after find_L_open_concave_iterative has been
accepted onto the dev/gfdl branch of MOM6.  All answers are bitwise identical
and no output or input is changed.
  Carried out minor refactoring in set_viscous_BBL as suggested by the reviews
of this PR, including the elimination of some unnecessary error handling and the
replacement of C2pi_3 as an argument to find_L_open_concave_trigonometric with
an internal parameter.  All answers are bitwise identical and there are no
changes to publicly visible interfaces.
It was blowing up with "forrtl: error (65): floating invalid" when
accessing dz in the halo at the boundary, but just sometimes.  My
default layout is trouble while my testing layout of 48 cores is not.
  Revised the interfaces to the myStats routine in the horizontal_regridding
module to avoid segmentation faults due to inconsistent horizontal indices and
array extents in global indexing mode.  Rather than passing in absolute array
extents to work on, an ocean grid type is now passed as an argument to myStats,
with the new optional full_halo argument used to capture the case where the
tracer statistics are being taken over the full data domain.  The most
frequently encountered problems occurred when the hard-coded debug variable in
the horiz_interp_and_extrap_tracer routines are changed from false to true.
When global indexing is not used, this revised work exactly as before, but when
it is used with global indexing, it avoids segmentation faults that were
preventing the model from running in some cases with all debugging enabled.
  Corrected bugs in the horizontal indexing in apply_topography_edits_from_file
that led to differing answers depending on the value of GLOBAL_INDEXING.  This
change gives identical results when GLOBAL_INDEXING is used, as can be seen by noting that G%idg_offset = G%isd_global + G%isd and that without global
indexing G%isd = 1, but with global indexing the new expressions give the same
answers as without them.  Because global indexing is not typically used,
answers are not changed for any cases in the MOM6-examples test suite.
@Hallberg-NOAA Hallberg-NOAA force-pushed the FMA_rotational_symmetry branch from f146fe3 to 6e4e2a8 Compare March 11, 2024 18:44
Hallberg-NOAA and others added 8 commits March 11, 2024 21:15
  Moved the post-initialization halo updates for thicknesses and temperatures
and salinities in initialize_MOM to occur immediately after the last point where
they are modified and before the remapped diagnostic grids are set up.  This
does not change any existing answers but it will enable the future use of the
thermo_var_ptr type in calls to thickness_to_dz when setting up remapped
diagnostics, and perhaps elsewhere in the initialization of other auxiliary
variables.  All answers are bitwise identical.
  Refactored MOM_diag_remap to work with global indices and to move logical
branches outside of do loops.  This was done by adding internal routines that
set the loop indices consistently with the NE c-grid convention used throughout
the MOM6 code and converting the optionally associated mask pointer into an
optional argument.  It also simplifies the logic of many of the expressions
within the remapping code.  There is also a new element, Z_based_coord, in the
diag_remap_ctrl type, to indicate whether the remapping is working in thickness
or height units, but for now it is always set to false.

  The function set_h_neglect or set_dz_neglect is used to set the negligible
thicknesses used for remapping in diag_remap_update and diag_remap_do_remap,
depending on whether the remapping is being done in thickness or vertical height
coordinates.  Diag_remap_init has a new vertical_grid_type argument, and
diag_remap_do_remap has a new unit_scale_type argument.

  For REMAPPING_ANSWER_DATES later than 20240201, diag_remap_updated does an
explicit sum to determine the total water column thickness, rather than using
sum function, which is indeterminate of the order of the sums.  For some
compilers, this could change the vertical grids used for remapping diagnostics
at roundoff, but no such change was detected for any of the compilers used with
the MOM6 regression test suite.

  All answers and diagnostics in cases that worked before are bitwise identical,
but there are new arguments to two publicly visible interfaces.
  Do remapping in Z-space for some remapped diagnostics, depending on which
coordinate is used.  The subroutine thickness_to_dz is used with the
thermo_vars_type to do the rescaling properly in non-Boussinesq mode.

  A new thermo_var_ptrs argument was added to diag_mediator_init, replacing
three other arguments that had the same information, and its call from MOM.F90
was modified accordingly.

  The various calls to the diag_remap routines from post_data_3d and
diag_update_remap_grids were modified depending on whether a z-unit or h-unit
vertical grid is being remapped to.

  All answers and diagnostics are identical in Boussinesq mode, but some
remapped diagnostics are changed (by not using expressions that depend on the
Boussinesq reference density) in the non-Boussinesq mode.  There are altered
or augmented public arguments to two publicly visible routines.
  Corrected indexing problems in downsample_mask that would cause masked
reduced-resolution diagnostics to work improperly when the model is in symmetric
memory mode or when global indexing is used.  This involves passing the starting
index in memory of the native-grid field being downsampled in all calls to
downsample_mask.  In global-indexing mode, this change avoids a series of
segmentation faults that stopped the model runs when compiled for debugging.
All solutions are bitwise identical in all cases but some down-scaled
diagnostics will change in some memory modes, hopefully becoming consistent
across all memory modes (although this has yet to be tested in all modes).
  Replaced the IIm1 and JJm1 variables in Update_Stokes_Drift with 'I-1' and
'J-1' in Update_Stokes_Drift.  The previous expressions could give incorrect
solutions at the southern and western edges of the global domain with global
indexing and serve no purpose when global indexing is not used.  In addition,
the i-, j- and k- index variables in Update_Surface_Waves, Update_Stokes_Drift,
get_Langmuir_Number and Get_SL_Average_Prof were changed from ii to i, jj to j,
and kk to k to follow the patterns used elsewhere in the MOM_wave_interface
module and throughout the rest of the MOM6 code.  All answers are bitwise
identical in cases that do not use global indexing.
  Added or amended comments to document the mostly arbitrary units of about 360
variables in 13 low-level remapping modules.  Only comments are changed and
all answers are bitwise identical.
This patch contains several bugfixes associated with the rotational grid
testing.

The following checksums are declared as scalar pairs, rather than
vectors:

* eta_[uv] (in porous barrier)
* por_face_area[UV]
* por_layer_width[UV]
* Ray_[uv] (Rayleigh drag velocity)

Fluxes and surface fields are now permitted to contain tracer fluxes
(tr_fluxes) when rotation is enabled.  The fields are retained in their
unrotated form, since these are accessed and handled outside of MOM6.

The rotated p_surf_SSH pointer in forces now correctly points to either
p_surf or p_surf_full.

read_netcdf_nc() now correctly uses the unrotated horizontal index
struct HI, used to access the contents of the file.  Previously, it was
using the model HI, which may be rotated.

Reading chlorophyll with time_interp_external now uses rotation to
correctly fetch its output.

NOTE: This could be cleaned up so that the rotation details are hidden
from users, but there are some unresolved issues around how to approach
this.
@Hallberg-NOAA Hallberg-NOAA force-pushed the FMA_rotational_symmetry branch from 6e4e2a8 to 720b81f Compare March 18, 2024 18:47
@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2024

Codecov Report

Attention: Patch coverage is 25.34483% with 433 lines in your changes are missing coverage. Please review.

Project coverage is 37.21%. Comparing base (d0e9c25) to head (720b81f).

Files Patch % Lines
src/core/MOM_density_integrals.F90 9.33% 130 Missing and 6 partials ⚠️
src/equation_of_state/MOM_EOS_Wright_full.F90 0.00% 36 Missing ⚠️
src/equation_of_state/MOM_EOS_Wright_red.F90 0.00% 36 Missing ⚠️
src/parameterizations/lateral/MOM_hor_visc.F90 40.00% 27 Missing ⚠️
src/equation_of_state/MOM_EOS_linear.F90 7.14% 26 Missing ⚠️
...c/parameterizations/lateral/MOM_internal_tides.F90 0.00% 25 Missing ⚠️
src/equation_of_state/MOM_EOS_Wright.F90 33.33% 24 Missing ⚠️
src/core/MOM_open_boundary.F90 0.00% 16 Missing ⚠️
...ig_src/drivers/solo_driver/MOM_surface_forcing.F90 11.76% 15 Missing ⚠️
src/core/MOM_CoriolisAdv.F90 33.33% 14 Missing ⚠️
... and 27 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@               Coverage Diff                @@
##           FMA_rot_sym_ref      #12   +/-   ##
================================================
  Coverage            37.20%   37.21%           
================================================
  Files                  271      271           
  Lines                80472    80482   +10     
  Branches             15008    15008           
================================================
+ Hits                 29943    29950    +7     
- Misses               44957    44960    +3     
  Partials              5572     5572           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

  This commit makes the unit_scale_type argument US to MOM_domains_init and
gen_auto_mask_table optional and moves it to the end of the argument list, so
that coupled or ice-ocean models using SIS2 will compile with the proposed
updates to the main branch of MOM6 from dev/ncar.  Because MOM6 and SIS2 use
some common framework code but are managed in separate github repositories, we
need to use optional argument to allow a single version of SIS2 to work across
changes to MOM6 interfaces.  Because the TOPO_CONFIG parameter as used in SIS2
has a default value, there is an alternative call to get_param for TOPO_CONFIG
with a default when MOM_domains_init is called with a domain_name argument.
Also added missing scale arguments to get_param calls for MINIMUM_DEPTH and
MASKING_DEPTH.  This commit also adds or corrects units in the comments
describing 4 recently added or modified variables.  All answers are bitwise
identical in any cases that worked before (noting that some cases using SIS2
would not even compile).
  Added parentheses to expressions taking the squares of the wind stress
components in 70 lines in 7 files so that these expressions will be rotationally
invariant when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs.
  Added parentheses to 9 expressions like `curv_3 = h_W(i) + h_E(i) - 2.0*h(i)`
in PPM_limit_pos, zonal_flux_layer, zonal_flux_thickness, merid_flux_layer and
merid_flux_thickness, changing them to `curv_3 = (h_W(i) + h_E(i)) - 2.0*h(i)`.
This change enforces the order of arithmetic that is required to give rotational
symmetry, but it also is the order that the Intel, GNU, and Nvidia compliers
were all already using in these expressions.  Moreover, had the order of
arithmetic ever been anything else, this would have led to failures in our
rotational consistency and redundant point consistency testing, and almost
certainly would have been detected before. However, by adding these parentheses,
there is a remote chance that the addition of these parentheses could change
answers for some compiler or compiler settings we have never tested before.
This change should not impact any FMA-enabled calculations.  All answers are
bitwise identical in the MOM6-examples regression suite as run on Gaea.
  Added parentheses to 16 expressions setting the grad_gradient arrays with
oblique_grad open boundary conditions and setting cff_new with all kinds of
oblique boundary conditions so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers with certain types of open boundary conditions could
change with FMAs.
  Added parentheses to 150 lines in the 5 generic density integral routines
(int_density_dz_generic_pcm, int_density_dz_generic_plm,
int_density_dz_generic_ppm, int_spec_vol_dp_generic_pcm and
int_spec_vol_dp_generic_plm) in the MOM_density_integrals module so that they
will be rotationally invariant when fused-multiply-adds are enabled.  All
answers are bitwise identical in cases without FMAs, but answers could change
with FMAs.
  Added parentheses to 140 lines in 8 int_density_dz and int_spec_vol_dp
routines for the linear, Wright, Wright_full and Wright_red equations of state
so that they will be rotationally invariant when fused-multiply-adds are
enabled.  All answers are bitwise identical in cases without FMAs, but answers
could change with FMAs.
  Removed recently added parentheses around expressions like '+ (hL*hR)' in 110
lines in MOM_density_integrals and 4 equation of state module to reflect that
these parentheses are not necessary for rotational symmetry in
FMAs.  All answers are bitwise identical in cases without FMAs, but
answers could change with FMAs.
  Added parentheses to 4 lines in PressureForce_Mont_nonBouss and
PressureForce_Mont_Bouss so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs in cases that use the
Montgomery potential form of the pressure gradient accelerations.
  Added parentheses to 4 lines in calc_isoneutral_slopes so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs.
  Added parentheses to 2 lines in MOM_calc_varT so that they will be
rotationally invariant when fused-multiply-adds are enabled.  In this case, FMAs
can still be applied to the impacted lines, exploiting that the masks are always
0 or 1.  Also added parentheses to 2 other lines used to generate the stochastic
pattern for rotational symmetry with FMAs.  All answers are bitwise identical in
cases without FMAs, but answers could change with FMAs.
  Added parentheses to the calculation of the diffusive temperature changes in
tracer_hordiff so that it will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs.
  Added parentheses to the calculation of the iceberg contribution to the
fractional area of ice shelves in iceberg_forces so that it will be rotationally
invariant when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs enabled in
cases with tabular icebergs.
  Added parentheses to the calculation of the Stokes-drift Coriolis velocity
increments in CoriolisStokes so that it will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical because
CoriolisStokes is still under development and is never called, with a fatal
error occurring if anyone tries to use it.  Also added parentheses to two
expressions calculating the magnitude of the Stokes velocity in
get_Langmuir_Number.  Answers could change for some cases that use Langmuir
turbulence parameterizations with FMAs enabled.
  Added the new element Coriolis2Bu to the ocean_grid_type and the
dyn_horgrid_type to hold the square of the Coriolis parameter, and use this
array in 10 routines (including btstep, set_dtbt, calculate_diagnostic_fields,
VarMix_init, propagate_int_tide, Calculate_kappa_shear, Calc_kappa_shear_vertex
and add_MLrad_diffusivity) that had been calculating and averaging the square of
the Coriolis parameter.  This could change some answers with FMAs enabled
because the compilers were previously free to split up some of the squares
when averaging the squared Coriolis parameter, but without FMAs all answers are
bitwise identical.  This commit does add a new element to two transparent
types.
  Added parentheses to 17 expressions in thickness_diffuse_full,
thickness_diffuse and thickness_diffuse_init to give rotationally consistent
solutions when fused-multiply-adds are enabled.  One comment was also added to
note that the calculation of PE_release_h is does not exhibit rotational
symmetry when MEKE_GM_SRC_ALT is set to true.  All answers are bitwise identical
in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 2 expressions in the Zanna_Bolton code and rearranged
another line so that the u- and v-discretizations introduce terms in the same
order so that the Zanna_Bolton code will exhibit rotationally consistent
solutions when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs enabled in
cases that use the Zanna-Bolton parameterization.
  Added parentheses to 19 expressions in the MOM_internal_tides propagation code
to exhibit rotationally consistent solutions when fused-multiply-adds are
enabled.  All answers are bitwise identical in cases without FMAs, but answers
could change when FMAs are enabled in models that use the ray-tracing based
internal tides code.
  Added parentheses to 10 expressions in find_uv_at_h to exhibit rotationally
consistent solutions and treat the velocities at both edges of a tracer cell
equivalently when fused-multiply-adds are enabled. All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 19 expressions in set_viscous_ML, set_u_at_v and
set_v_at_u to treat the velocities at both edges of a tracer cell equivalently
when fused-multiply-adds are enabled, and thereby to exhibit exhibit
rotationally consistent solutions.  Also swapped the order of the u- and
v-components in the u-point calculation of Uh2 to mirror the order of the
corresponging v-point calculation for the same purpose.  All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added mathematically equivalent rearrangements of the code in
calc_kappa_shear_vertex that interpolates velocities, temperatures and
salinities to the vertices to expose the mask variables while ensuring that the
other multiplications occur within parentheses so that they will exhibit
rotational symmetry when fused-multiply-adds are enabled.  FMAs can still occur,
but it will be multiplication by the 0-or-1 masks that are fused with an
addition.  Also added parentheses to 3 expressions calculating the squared shear
in calculate_projected_state for rotational symmetry with FMAs.  All answers are
bitwise identical in cases without FMAs, but answers could change when FMAs are
enabled.
  Added parentheses to 4 expressions in add_drag_diffusivity, set_BBL_TKE and
add_LOTW_BBL_diffusivity setting the bottom-drag contributions to TKE and
friction velocity so that they will exhibit rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 20 expressions in CorAdCalc and one in gradKE to exhibit
rotationally consistent solutions when fused-multiply-adds are enabled.  All
answers are bitwise identical in cases without FMAs, but answers could change
when FMAs are enabled.
  Added parentheses to 18 expressions in btstep, and one more each in set_dtbt
and barotropic_init to exhibit rotationally consistent solutions when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 19 expressions in 5 routines (calc_Visbeck_coeffs_old,
calc_Eady_growth_rate_2D, calc_slope_functions_using_just_e,
calc_QG_Leith_viscosity VarMix_init) in MOM_lateral_mixing_coeffs.F90 to give
rotationally consistent solutions when fused-multiply-adds are enabled.  Also
reordered terms in a sum in the calculation of beta_dx2_u to mirror that of
beta_dx2_v, also for rotational symmetry with FMAs.  All answers are bitwise
identical in cases without FMAs, but answers could change for some parameter
settings when FMAs are enabled.
  Added parentheses to 40 expressions horizontal_viscosity and another 14
expressions in in hor_visc_init and 3 more in align_aniso_tensor_to_grid to give
rotationally consistent solutions when fused-multiply-adds are enabled.   Also
swapped the order of two terms in the expression for Del2u to mirror the order
of the corresponding terms in Del2v for rotational symmetry with FMAs.  All
answers are bitwise identical in cases without FMAs, but answers could change
when FMAs are enabled.
  Added parentheses to 20 sums of squares of x- and y- distances or velocity
components used for initialization in 8 modules to give rotationally consistent
solutions when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 29 sums of squares of velocity or other vector components
used in parameterizations in 9 modules to give rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 9 diagnostics of Coriolis accelerations or expressions
used in the kinetic energy budgets to give rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 4 tracer edge value calculations used with PPM tracer
advection to give rotationally consistent solutions when fused-multiply-adds are
enabled.  Although these lines may not appear to need parentheses, some
compliers appear to be putting these expressions directly into others, where the
direction of the flow seems to determine which multiplications are incorporated
into FMAs.  All answers are bitwise identical in cases without FMAs, but answers
could change when FMAs are enabled.
@Hallberg-NOAA Hallberg-NOAA force-pushed the FMA_rotational_symmetry branch from 720b81f to 20003f7 Compare May 5, 2024 03:24
Hallberg-NOAA pushed a commit that referenced this pull request May 15, 2024
Replace db array default values with real literals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants