forked from mom-ocean/MOM6
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FMA parentheses draft PR #12
Open
Hallberg-NOAA
wants to merge
78
commits into
FMA_rot_sym_ref
Choose a base branch
from
FMA_rotational_symmetry
base: FMA_rot_sym_ref
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Enhancements for adding land block elimination to NUOPC cap: - Add sum_across_PEs_int4_2d to the sum_across_PEs interface - Allow mask_table file to be placed in run directory (now, the first dir that is looked at). * Enhance NUOPC cap to support MOM_mask_table. - Determine masked blocks. - Evenly distribute eliminated cells. - Fill ESMF gindex array accordingly. - During Export phase, set fields of eliminated cells to zero. * set %label in register_netcdf_field and register_netcdf_axis * first working version of an automated mask table generator * While determining masked blocks, take reentrancy and tripolar stitch into account * apply tripolar stitch fix in auto mask_table generation * add AUTO_IO_LAYOUT_FAC parameter to control IO_LOAYUT when AUTO_MASKTABLE is on * Miscellaneous auto masking fixes to address reviews: - Dimensionalize topographic depth variables used to determine cell masks in auto masktable routine. - Raise error if the user provided PE layout is inconsistent with auto masktable generation. - Save the masktable parameter description to a string variable to avoid repetition. - Fix typos, whitespaces, use modern array syntax. * Disable FPEs in MacOS testing Due to poor handling of floating point in HDF5 1.14.3, it is currently not possible to use floating point exceptions (FPEs) whenever this version is present. The GitHub Actions CI nodes would randomly select either 1.14.2 or 1.14.3, and would raise an FPE error if 1.14.3 was selected. Additionally, the homebrew installation does not provide a clean method for selecting a different version of HDF5. Thus, for now we disable FPEs in the MacOS testing, and hope to catch any legitimate FP errors in the Ubuntu version. We will restore these tests as soon as this has been fixed in an easily-accessible version of HDF5. As part of this PR, I have also moved the FCFLAGS configuration to the platform specific Actions files, allowing for independent compiler configuration for each platform. --------- Co-authored-by: Marshall Ward <[email protected]>
* Fix biharmonic Leith Biharmonic Leith uses Del omega at is-1 and js-1. This unavoidably requires u at js-3 and v at is-3, which are unavailable. It also requires Del omega at Ieq+1 and Jeq+1, which requires v at Ieq+3 and u at Jeq+3, which are unavailable. This necessitates a halo update. Fixes several bugs in Leith+E. - Fixes indexing when computing smoothed vorticity and its gradient - Crucially, computes `vert_vort_mag` when using Leith+E - Fixes some logic in the smoothing code - Other minor indexing fixes * Leith+E Logic Update Ah is required at h and q points. The original code computed Ah at h points, then packed into Ah_h, then applied upper bounds to Ah. If Ah_h is in the diag_table or if debug is true, then the value of Ah with upper bounds get packed into Ah_h. Then, at q points the code unpacks Ah_h. This update makes sure that the upper bound gets applied to q points, not just h points. * Leith+E halo updates The main thing that this commit does is to perform smoothing of u and v outside of the loop over layers. This swaps nz 2D blocking halo updates for a single blocking 3D halo update. * Leith+E smoothing This commit adds a runtime flag, SMOOTH_AH. If True (default) then `m_leithy` and `Ah` are both smoothed, which leads to many blocking communications. If False then these fields are rougher, but there is less communication. * Leith+E eliminate pass-var This commit removes one halo update in Leith+E. To achieve this requires re-indexing two assignments. The value of Ah and Kh are computed at h points, then re-used at q points. Without the halo update it is necessary to offset the assignment at h and q points, e.g. Kh(I,J) = Kh_h(i+1,j+1,k), to avoid accessing values that have not been computed. * Leith+E OBC Adds code so that Leith+E works with OBC. * Leith+E eliminate halo update This commit eliminates one more halo update in Leith+E. * *Correct rotational symmetry with USE_LEITHY This commit revises the smoothing code used when USE_LEITHY = True to give answers that respect rotational symmetry and it also corrects some horizontal indexing bugs and problems with the staggering in some halo update and smooth_x9 calls and reduces some loop ranges to their minimal required values. The specific changes include: 1. Corrected a horizontal indexing bug when interpolating Kh_h and Ah_h to corner (q) points when USE_LEITHY = True. These had previously been inappropriately copied from the thickness point to the southwest of the corner point. This required symmetric-memory-mode calculations of the thickness point viscosities whenever USE_LEITHY is true, but to avoid adding complicated logic, the symmetric-memory loop bounds are used for the calculation of Kh. 2. Revised smooth_x9 to give rotationally symmetric answers and split it into the two routines smooth_x9_h and smooth_x9_uv to reduce the memory used by this routine and reduce the use of optional arguments. 3. Eliminated 4 unneeded halo update calls, and added error handling for the case where Leith options are used with insufficiently wide halos. 4. Added new integers to indicate the loop ranges over which the viscosities and related variables should be calculated, depending on which options are active, and then adjusted 91 do-loop extents horizontal_viscosity code to reflect the loop ranges over which arrays are actually used. 5. Added a new 2-d variable for the squared viscosity for smoothing that can be used for halo updates and to avoid having a variable with confusingly inconsistent dimensions at various points in the code. 6. Corrected the position arguments on 2 smooth_x9 calls and 4 pass_var calls that are used when USE_LEITHY=.true. and SMOOTH_AH=.true. As previously written, these smooth_x9 and pass_var calls would work when in non-symmetric memory mode but would give incorrect answers when in symmetric memory mode. These revisions change answers when USE_LEITHY is true, but answers are bitwise identical in all other cases. --------- Co-authored-by: Robert Hallberg <[email protected]>
Merge latest main
Minor version updates to multiple packages used in the generation of the documentation introduce dependencies needing Sphinx >= 5.0, which breaks the sphinx extensions we use in documenting the MOM6 APIs. I have added versions for all the packages needed to keep things working with Sphinx4 for now, but we really do need to find a way to work with the newer versions. More pinningaof of requirements for readthedocs More pinning of requirements for readthedocs
This patch introduces two new macros, BUILD and WORK, to permit relocation of the build/ and work/ directories. It also makes the following smaller changes: * deps/ is now defined by the DEPS macro. If unset, deps/ is placed in the BUILD directory. * results/ is moved into WORK. * Compiler flags which track directories now use $(abspath ...) to allow for arbitrary paths. * GitHub CI paths were adjusted to support these new settings. * DO_* flags are now used as on/off with ifdef testing, rather than checking for `true` values. * mkmf macros have been removed from the coupled test config. * The default FMS infra has been changed to FMS2 in all components, including the configure.ac outside of .testing. This work will enable testing of multiple FMS libraries in our CI.
Merge latest main (Feb 28, 2024)
Refactored set_viscous_BBL to separate out the routines setting the open interface lengths used for the channel drag, shortening a 1070 line long routine to 915 lines and reducing the scope of a number of temporary variables. A number of logical branch points have been moved outside of the innermost do loops. This refactoring will also make it easier to provide alternatives to some of the solvers that do not use the trigonometric functions to solve for the roots of a cubic expression and avoiding the issues noted at NOAA-GFDL/issues/483. All answers are bitwise identical and public interfaces are unchanged.
Added the new routine find_L_open_concave_iterative to use iterative Newton's method approaches with appropriate limits to solve the cubic equation for the fractional open face lengths at interfaces that are used by the CHANNEL_DRAG code. These solutions are analogous to those given by the previous expressions that are now in find_L_open_concave_trigonometric, and the two differ at close to roundoff, but the new method is completely independent of the transcendental function library, thereby addressing dev/gfdl MOM6 issue mom-ocean#483. This new routine is called when the new runtime parameter TRIG_CHANNEL_DRAG_WIDTHS is set to false, but by default the previous answers are recovered. By default all answers are bitwise identical, but there is a new runtime parameter in some MOM_parameter_doc files.
Added the new debugging or testing subroutine test_L_open_concave along with extra calls when DEBUG = True that can be used to demonstrate that the iterative solver in find_L_open_concave_iterative is substantially more accurate but mathematically equivalent to the solver in find_L_open_concave_trigonometric. This extra code is only called in debugging mode, and it probably should be deleted in a separate commit after find_L_open_concave_iterative has been accepted onto the dev/gfdl branch of MOM6. All answers are bitwise identical and no output or input is changed.
Carried out minor refactoring in set_viscous_BBL as suggested by the reviews of this PR, including the elimination of some unnecessary error handling and the replacement of C2pi_3 as an argument to find_L_open_concave_trigonometric with an internal parameter. All answers are bitwise identical and there are no changes to publicly visible interfaces.
It was blowing up with "forrtl: error (65): floating invalid" when accessing dz in the halo at the boundary, but just sometimes. My default layout is trouble while my testing layout of 48 cores is not.
Revised the interfaces to the myStats routine in the horizontal_regridding module to avoid segmentation faults due to inconsistent horizontal indices and array extents in global indexing mode. Rather than passing in absolute array extents to work on, an ocean grid type is now passed as an argument to myStats, with the new optional full_halo argument used to capture the case where the tracer statistics are being taken over the full data domain. The most frequently encountered problems occurred when the hard-coded debug variable in the horiz_interp_and_extrap_tracer routines are changed from false to true. When global indexing is not used, this revised work exactly as before, but when it is used with global indexing, it avoids segmentation faults that were preventing the model from running in some cases with all debugging enabled.
Corrected bugs in the horizontal indexing in apply_topography_edits_from_file that led to differing answers depending on the value of GLOBAL_INDEXING. This change gives identical results when GLOBAL_INDEXING is used, as can be seen by noting that G%idg_offset = G%isd_global + G%isd and that without global indexing G%isd = 1, but with global indexing the new expressions give the same answers as without them. Because global indexing is not typically used, answers are not changed for any cases in the MOM6-examples test suite.
Hallberg-NOAA
force-pushed
the
FMA_rotational_symmetry
branch
from
March 11, 2024 18:44
f146fe3
to
6e4e2a8
Compare
Moved the post-initialization halo updates for thicknesses and temperatures and salinities in initialize_MOM to occur immediately after the last point where they are modified and before the remapped diagnostic grids are set up. This does not change any existing answers but it will enable the future use of the thermo_var_ptr type in calls to thickness_to_dz when setting up remapped diagnostics, and perhaps elsewhere in the initialization of other auxiliary variables. All answers are bitwise identical.
Refactored MOM_diag_remap to work with global indices and to move logical branches outside of do loops. This was done by adding internal routines that set the loop indices consistently with the NE c-grid convention used throughout the MOM6 code and converting the optionally associated mask pointer into an optional argument. It also simplifies the logic of many of the expressions within the remapping code. There is also a new element, Z_based_coord, in the diag_remap_ctrl type, to indicate whether the remapping is working in thickness or height units, but for now it is always set to false. The function set_h_neglect or set_dz_neglect is used to set the negligible thicknesses used for remapping in diag_remap_update and diag_remap_do_remap, depending on whether the remapping is being done in thickness or vertical height coordinates. Diag_remap_init has a new vertical_grid_type argument, and diag_remap_do_remap has a new unit_scale_type argument. For REMAPPING_ANSWER_DATES later than 20240201, diag_remap_updated does an explicit sum to determine the total water column thickness, rather than using sum function, which is indeterminate of the order of the sums. For some compilers, this could change the vertical grids used for remapping diagnostics at roundoff, but no such change was detected for any of the compilers used with the MOM6 regression test suite. All answers and diagnostics in cases that worked before are bitwise identical, but there are new arguments to two publicly visible interfaces.
Do remapping in Z-space for some remapped diagnostics, depending on which coordinate is used. The subroutine thickness_to_dz is used with the thermo_vars_type to do the rescaling properly in non-Boussinesq mode. A new thermo_var_ptrs argument was added to diag_mediator_init, replacing three other arguments that had the same information, and its call from MOM.F90 was modified accordingly. The various calls to the diag_remap routines from post_data_3d and diag_update_remap_grids were modified depending on whether a z-unit or h-unit vertical grid is being remapped to. All answers and diagnostics are identical in Boussinesq mode, but some remapped diagnostics are changed (by not using expressions that depend on the Boussinesq reference density) in the non-Boussinesq mode. There are altered or augmented public arguments to two publicly visible routines.
Corrected indexing problems in downsample_mask that would cause masked reduced-resolution diagnostics to work improperly when the model is in symmetric memory mode or when global indexing is used. This involves passing the starting index in memory of the native-grid field being downsampled in all calls to downsample_mask. In global-indexing mode, this change avoids a series of segmentation faults that stopped the model runs when compiled for debugging. All solutions are bitwise identical in all cases but some down-scaled diagnostics will change in some memory modes, hopefully becoming consistent across all memory modes (although this has yet to be tested in all modes).
Replaced the IIm1 and JJm1 variables in Update_Stokes_Drift with 'I-1' and 'J-1' in Update_Stokes_Drift. The previous expressions could give incorrect solutions at the southern and western edges of the global domain with global indexing and serve no purpose when global indexing is not used. In addition, the i-, j- and k- index variables in Update_Surface_Waves, Update_Stokes_Drift, get_Langmuir_Number and Get_SL_Average_Prof were changed from ii to i, jj to j, and kk to k to follow the patterns used elsewhere in the MOM_wave_interface module and throughout the rest of the MOM6 code. All answers are bitwise identical in cases that do not use global indexing.
Added or amended comments to document the mostly arbitrary units of about 360 variables in 13 low-level remapping modules. Only comments are changed and all answers are bitwise identical.
This patch contains several bugfixes associated with the rotational grid testing. The following checksums are declared as scalar pairs, rather than vectors: * eta_[uv] (in porous barrier) * por_face_area[UV] * por_layer_width[UV] * Ray_[uv] (Rayleigh drag velocity) Fluxes and surface fields are now permitted to contain tracer fluxes (tr_fluxes) when rotation is enabled. The fields are retained in their unrotated form, since these are accessed and handled outside of MOM6. The rotated p_surf_SSH pointer in forces now correctly points to either p_surf or p_surf_full. read_netcdf_nc() now correctly uses the unrotated horizontal index struct HI, used to access the contents of the file. Previously, it was using the model HI, which may be rotated. Reading chlorophyll with time_interp_external now uses rotation to correctly fetch its output. NOTE: This could be cleaned up so that the rotation details are hidden from users, but there are some unresolved issues around how to approach this.
Hallberg-NOAA
force-pushed
the
FMA_rotational_symmetry
branch
from
March 18, 2024 18:47
6e4e2a8
to
720b81f
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## FMA_rot_sym_ref #12 +/- ##
================================================
Coverage 37.20% 37.21%
================================================
Files 271 271
Lines 80472 80482 +10
Branches 15008 15008
================================================
+ Hits 29943 29950 +7
- Misses 44957 44960 +3
Partials 5572 5572 ☔ View full report in Codecov by Sentry. |
This commit makes the unit_scale_type argument US to MOM_domains_init and gen_auto_mask_table optional and moves it to the end of the argument list, so that coupled or ice-ocean models using SIS2 will compile with the proposed updates to the main branch of MOM6 from dev/ncar. Because MOM6 and SIS2 use some common framework code but are managed in separate github repositories, we need to use optional argument to allow a single version of SIS2 to work across changes to MOM6 interfaces. Because the TOPO_CONFIG parameter as used in SIS2 has a default value, there is an alternative call to get_param for TOPO_CONFIG with a default when MOM_domains_init is called with a domain_name argument. Also added missing scale arguments to get_param calls for MINIMUM_DEPTH and MASKING_DEPTH. This commit also adds or corrects units in the comments describing 4 recently added or modified variables. All answers are bitwise identical in any cases that worked before (noting that some cases using SIS2 would not even compile).
Added parentheses to expressions taking the squares of the wind stress components in 70 lines in 7 files so that these expressions will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to 9 expressions like `curv_3 = h_W(i) + h_E(i) - 2.0*h(i)` in PPM_limit_pos, zonal_flux_layer, zonal_flux_thickness, merid_flux_layer and merid_flux_thickness, changing them to `curv_3 = (h_W(i) + h_E(i)) - 2.0*h(i)`. This change enforces the order of arithmetic that is required to give rotational symmetry, but it also is the order that the Intel, GNU, and Nvidia compliers were all already using in these expressions. Moreover, had the order of arithmetic ever been anything else, this would have led to failures in our rotational consistency and redundant point consistency testing, and almost certainly would have been detected before. However, by adding these parentheses, there is a remote chance that the addition of these parentheses could change answers for some compiler or compiler settings we have never tested before. This change should not impact any FMA-enabled calculations. All answers are bitwise identical in the MOM6-examples regression suite as run on Gaea.
Added parentheses to 16 expressions setting the grad_gradient arrays with oblique_grad open boundary conditions and setting cff_new with all kinds of oblique boundary conditions so that they will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers with certain types of open boundary conditions could change with FMAs.
Added parentheses to 150 lines in the 5 generic density integral routines (int_density_dz_generic_pcm, int_density_dz_generic_plm, int_density_dz_generic_ppm, int_spec_vol_dp_generic_pcm and int_spec_vol_dp_generic_plm) in the MOM_density_integrals module so that they will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to 140 lines in 8 int_density_dz and int_spec_vol_dp routines for the linear, Wright, Wright_full and Wright_red equations of state so that they will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Removed recently added parentheses around expressions like '+ (hL*hR)' in 110 lines in MOM_density_integrals and 4 equation of state module to reflect that these parentheses are not necessary for rotational symmetry in FMAs. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to 4 lines in PressureForce_Mont_nonBouss and PressureForce_Mont_Bouss so that they will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs in cases that use the Montgomery potential form of the pressure gradient accelerations.
Added parentheses to 4 lines in calc_isoneutral_slopes so that they will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to 2 lines in MOM_calc_varT so that they will be rotationally invariant when fused-multiply-adds are enabled. In this case, FMAs can still be applied to the impacted lines, exploiting that the masks are always 0 or 1. Also added parentheses to 2 other lines used to generate the stochastic pattern for rotational symmetry with FMAs. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to the calculation of the diffusive temperature changes in tracer_hordiff so that it will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs.
Added parentheses to the calculation of the iceberg contribution to the fractional area of ice shelves in iceberg_forces so that it will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs enabled in cases with tabular icebergs.
Added parentheses to the calculation of the Stokes-drift Coriolis velocity increments in CoriolisStokes so that it will be rotationally invariant when fused-multiply-adds are enabled. All answers are bitwise identical because CoriolisStokes is still under development and is never called, with a fatal error occurring if anyone tries to use it. Also added parentheses to two expressions calculating the magnitude of the Stokes velocity in get_Langmuir_Number. Answers could change for some cases that use Langmuir turbulence parameterizations with FMAs enabled.
Added the new element Coriolis2Bu to the ocean_grid_type and the dyn_horgrid_type to hold the square of the Coriolis parameter, and use this array in 10 routines (including btstep, set_dtbt, calculate_diagnostic_fields, VarMix_init, propagate_int_tide, Calculate_kappa_shear, Calc_kappa_shear_vertex and add_MLrad_diffusivity) that had been calculating and averaging the square of the Coriolis parameter. This could change some answers with FMAs enabled because the compilers were previously free to split up some of the squares when averaging the squared Coriolis parameter, but without FMAs all answers are bitwise identical. This commit does add a new element to two transparent types.
Added parentheses to 17 expressions in thickness_diffuse_full, thickness_diffuse and thickness_diffuse_init to give rotationally consistent solutions when fused-multiply-adds are enabled. One comment was also added to note that the calculation of PE_release_h is does not exhibit rotational symmetry when MEKE_GM_SRC_ALT is set to true. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 2 expressions in the Zanna_Bolton code and rearranged another line so that the u- and v-discretizations introduce terms in the same order so that the Zanna_Bolton code will exhibit rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change with FMAs enabled in cases that use the Zanna-Bolton parameterization.
Added parentheses to 19 expressions in the MOM_internal_tides propagation code to exhibit rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled in models that use the ray-tracing based internal tides code.
Added parentheses to 10 expressions in find_uv_at_h to exhibit rotationally consistent solutions and treat the velocities at both edges of a tracer cell equivalently when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 19 expressions in set_viscous_ML, set_u_at_v and set_v_at_u to treat the velocities at both edges of a tracer cell equivalently when fused-multiply-adds are enabled, and thereby to exhibit exhibit rotationally consistent solutions. Also swapped the order of the u- and v-components in the u-point calculation of Uh2 to mirror the order of the corresponging v-point calculation for the same purpose. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added mathematically equivalent rearrangements of the code in calc_kappa_shear_vertex that interpolates velocities, temperatures and salinities to the vertices to expose the mask variables while ensuring that the other multiplications occur within parentheses so that they will exhibit rotational symmetry when fused-multiply-adds are enabled. FMAs can still occur, but it will be multiplication by the 0-or-1 masks that are fused with an addition. Also added parentheses to 3 expressions calculating the squared shear in calculate_projected_state for rotational symmetry with FMAs. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 4 expressions in add_drag_diffusivity, set_BBL_TKE and add_LOTW_BBL_diffusivity setting the bottom-drag contributions to TKE and friction velocity so that they will exhibit rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 20 expressions in CorAdCalc and one in gradKE to exhibit rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 18 expressions in btstep, and one more each in set_dtbt and barotropic_init to exhibit rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 19 expressions in 5 routines (calc_Visbeck_coeffs_old, calc_Eady_growth_rate_2D, calc_slope_functions_using_just_e, calc_QG_Leith_viscosity VarMix_init) in MOM_lateral_mixing_coeffs.F90 to give rotationally consistent solutions when fused-multiply-adds are enabled. Also reordered terms in a sum in the calculation of beta_dx2_u to mirror that of beta_dx2_v, also for rotational symmetry with FMAs. All answers are bitwise identical in cases without FMAs, but answers could change for some parameter settings when FMAs are enabled.
Added parentheses to 40 expressions horizontal_viscosity and another 14 expressions in in hor_visc_init and 3 more in align_aniso_tensor_to_grid to give rotationally consistent solutions when fused-multiply-adds are enabled. Also swapped the order of two terms in the expression for Del2u to mirror the order of the corresponding terms in Del2v for rotational symmetry with FMAs. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 20 sums of squares of x- and y- distances or velocity components used for initialization in 8 modules to give rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 29 sums of squares of velocity or other vector components used in parameterizations in 9 modules to give rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 9 diagnostics of Coriolis accelerations or expressions used in the kinetic energy budgets to give rotationally consistent solutions when fused-multiply-adds are enabled. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Added parentheses to 4 tracer edge value calculations used with PPM tracer advection to give rotationally consistent solutions when fused-multiply-adds are enabled. Although these lines may not appear to need parentheses, some compliers appear to be putting these expressions directly into others, where the direction of the flow seems to determine which multiplications are incorporated into FMAs. All answers are bitwise identical in cases without FMAs, but answers could change when FMAs are enabled.
Hallberg-NOAA
force-pushed
the
FMA_rotational_symmetry
branch
from
May 5, 2024 03:24
720b81f
to
20003f7
Compare
Hallberg-NOAA
pushed a commit
that referenced
this pull request
May 15, 2024
Replace db array default values with real literals
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the parentheses that appear to be needed so that FMAs will exhibit rotational symmetry.