Move bottomDepthEdge calculation to single loop over all edges #5356

mark-petersen · 2022-12-07T22:21:04Z

After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug.

This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions.

Fixes #5219
[BFB]

mark-petersen · 2022-12-07T22:32:02Z

Testing on cori, the compass nightly suite are bfb identical for all tests using gnu debug, gnu optimized, and intel debug. For intel optimized the following tests fail bfb match with master for this PR (90fb60d to 582899e):

00:06 FAIL ocean_baroclinic_channel_10km_default
00:10 FAIL ocean_baroclinic_channel_10km_threads_test
00:13 FAIL ocean_baroclinic_channel_10km_decomp_test
00:06 FAIL ocean_baroclinic_channel_10km_restart_test
01:41 PASS ocean_global_ocean_QU240_mesh
00:51 PASS ocean_global_ocean_QU240_PHC_init
00:48 PASS ocean_global_ocean_QU240_PHC_performance_test
01:33 PASS ocean_global_ocean_QU240_PHC_restart_test
01:31 PASS ocean_global_ocean_QU240_PHC_decomp_test
01:31 PASS ocean_global_ocean_QU240_PHC_threads_test
01:08 PASS ocean_global_ocean_QU240_PHC_analysis_test
00:42 PASS ocean_global_ocean_QU240_PHC_RK4_performance_test
01:25 PASS ocean_global_ocean_QU240_PHC_RK4_restart_test
01:29 PASS ocean_global_ocean_QU240_PHC_RK4_decomp_test
01:29 PASS ocean_global_ocean_QU240_PHC_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_PHC_init
00:46 PASS ocean_global_ocean_QUwISC240_PHC_performance_test
00:14 FAIL ocean_ice_shelf_2d_5km_z-star_restart_test
00:24 FAIL ocean_ice_shelf_2d_5km_z-level_restart_test
00:16 FAIL ocean_ziso_20km_default
00:18 FAIL ocean_ziso_20km_with_frazil

All differences are 1e-12 or smaller. Considering that we changed order of operations for intel optimized, this is not surprising, and is acceptable for this PR.

xylar · 2022-12-08T08:03:35Z

@mark-petersen, thanks for this fix! I will test it in both compass and E3SM. I see you have the BFB flag but I wouldn't expect it to necessarily be BFB in E3SM, based on your compass testing. Let's see what happens.

xylar · 2022-12-08T13:46:45Z

Testing

compass

I tested with the compass pr suite on Chrysalis with Intel and OpenMPI (optimized). I used 90fb60d as a baseline. With the baseline, I saw validation failures with the baseline for:

ocean/baroclinic_channel/10km/decomp_test
  * step: initial_state
  * step: 4proc
  * step: 8proc
  test execution:      SUCCESS
  test validation:     FAIL
  see: case_outputs/ocean_baroclinic_channel_10km_decomp_test.log
  test runtime:        00:03
...
ocean/global_ocean/QU240/PHC/decomp_test
  * step: 4proc
  * step: 8proc
  test execution:      SUCCESS
  test validation:     FAIL
  see: case_outputs/ocean_global_ocean_QU240_PHC_decomp_test.log
  test runtime:        00:54

(This is interesting because ocean/global_ocean/QU240/PHC/decomp_test had passed in my previous testing with Intel and Intel-MPI, so the use of OpenMPI--what E3SM also uses on Chrysalis--may actually make this problem worse.)

With this branch, I see all tests passing execution and validation but all split-explicit tests are failing baseline comparison, as @mark-petersen saw and as we expected given the change of order of operations. As @mark-petersen found, I'm seeing differences on the order of 1e-12, so small but not quite machine precision.

E3SM

I am ran PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel, again using 90fb60d as a baseline. Results were bit-for-bit but I'll retest once @philipwjones's concerns are addressed.

components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F

mark-petersen · 2022-12-09T16:38:35Z

After the last commit, I retested the nightly suite on chrysalis. Intel debug matches bfb between master and this PR, and passes all tests. Intel optimized fails bfb comparison to master, same as above.

xylar · 2022-12-12T20:36:30Z

I'll retest this tomorrow.

xylar · 2022-12-13T14:58:05Z

Both the pr test suite in compass (with Intel and OpenMPI in optimized mode) and the E3SM PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel passed as before. Teh pr test suite was non-BFB for the same tests as before (and using the same baseline), whereas PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel was BFB with the same baseline as before.

xylar

Approving based on my testing and @mark-petersen's.

philipwjones

Approve based on code inspection and others' testing. Previous comments were satisfactorily addressed.

Still getting tripped up by the fact that config_num_halos is not the actual halo width for edges, so this looks like we're out of bounds. But that's another discussion for omega...

…5356) Move bottomDepthEdge calculation to single loop over all edges After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug. This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions. Fixes #5219 [BFB]

jonbob · 2022-12-13T22:25:15Z

passes:

ERS_Ld5.T62_oQU120.CMPASO-NYF.chrysalis_intel
SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
PET_Ln9_PS.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-mach-pet
SMS_P12x2.ne4_oQU240.WCYCL1850NS.chrysalis_intel.allactive-mach_mods
SMS_D_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
PEM.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
ERS_Ld5.T62_oQU120.CMPASO-NYF.compy_pgi

merged to next

jonbob · 2022-12-14T19:34:11Z

merged to master

This merge updates the E3SM-Project submodule from [569ed6b730](https://github.com/E3SM-Project/E3SM/tree/569ed6b730) to [0273cfad9d](https://github.com/E3SM-Project/E3SM/tree/0273cfad9d). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/E3SM#5306 - [ ] (fwk) E3SM-Project/E3SM#5303 - [ ] (ocn) E3SM-Project/E3SM#5325 - [ ] (fwk) E3SM-Project/E3SM#5337 - [ ] (fwk) E3SM-Project/E3SM#5123 - [ ] (fwk) E3SM-Project/E3SM#5281 - [ ] (ocn) E3SM-Project/E3SM#5356

Move bottomDepthEdge calculation to single loop over all edges

582899e

mark-petersen added mpas-ocean bug fix PR BFB PR leaves answers BFB labels Dec 7, 2022

mark-petersen requested a review from xylar December 7, 2022 22:21

mark-petersen assigned jonbob Dec 7, 2022

philipwjones reviewed Dec 8, 2022

View reviewed changes

components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F Outdated Show resolved Hide resolved

mark-petersen mentioned this pull request Dec 8, 2022

Update E3SM-Project submodule MPAS-Dev/compass#461

Merged

32 tasks

Change nEdgesArray to nEdgesHalo

824850e

xylar approved these changes Dec 13, 2022

View reviewed changes

xylar requested review from philipwjones and removed request for philipwjones December 13, 2022 16:28

philipwjones approved these changes Dec 13, 2022

View reviewed changes

jonbob merged commit 0614e7b into E3SM-Project:master Dec 14, 2022

xylar mentioned this pull request Dec 15, 2022

Update E3SM-Project submodule MPAS-Dev/compass#480

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move bottomDepthEdge calculation to single loop over all edges #5356

Move bottomDepthEdge calculation to single loop over all edges #5356

mark-petersen commented Dec 7, 2022 •

edited by jonbob

Loading

mark-petersen commented Dec 7, 2022

xylar commented Dec 8, 2022

xylar commented Dec 8, 2022 •

edited

Loading

mark-petersen commented Dec 9, 2022

xylar commented Dec 12, 2022

xylar commented Dec 13, 2022

xylar left a comment

philipwjones left a comment

jonbob commented Dec 13, 2022

jonbob commented Dec 14, 2022

Move bottomDepthEdge calculation to single loop over all edges #5356

Move bottomDepthEdge calculation to single loop over all edges #5356

Conversation

mark-petersen commented Dec 7, 2022 • edited by jonbob Loading

mark-petersen commented Dec 7, 2022

xylar commented Dec 8, 2022

xylar commented Dec 8, 2022 • edited Loading

Testing

compass

E3SM

mark-petersen commented Dec 9, 2022

xylar commented Dec 12, 2022

xylar commented Dec 13, 2022

xylar left a comment

Choose a reason for hiding this comment

philipwjones left a comment

Choose a reason for hiding this comment

jonbob commented Dec 13, 2022

jonbob commented Dec 14, 2022

mark-petersen commented Dec 7, 2022 •

edited by jonbob

Loading

xylar commented Dec 8, 2022 •

edited

Loading