Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move bottomDepthEdge calculation to single loop over all edges #5356

Conversation

mark-petersen
Copy link
Contributor

@mark-petersen mark-petersen commented Dec 7, 2022

After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug.

This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions.

Fixes #5219
[BFB]

@mark-petersen
Copy link
Contributor Author

Testing on cori, the compass nightly suite are bfb identical for all tests using gnu debug, gnu optimized, and intel debug. For intel optimized the following tests fail bfb match with master for this PR (90fb60d to 582899e):

00:06 FAIL ocean_baroclinic_channel_10km_default
00:10 FAIL ocean_baroclinic_channel_10km_threads_test
00:13 FAIL ocean_baroclinic_channel_10km_decomp_test
00:06 FAIL ocean_baroclinic_channel_10km_restart_test
01:41 PASS ocean_global_ocean_QU240_mesh
00:51 PASS ocean_global_ocean_QU240_PHC_init
00:48 PASS ocean_global_ocean_QU240_PHC_performance_test
01:33 PASS ocean_global_ocean_QU240_PHC_restart_test
01:31 PASS ocean_global_ocean_QU240_PHC_decomp_test
01:31 PASS ocean_global_ocean_QU240_PHC_threads_test
01:08 PASS ocean_global_ocean_QU240_PHC_analysis_test
00:42 PASS ocean_global_ocean_QU240_PHC_RK4_performance_test
01:25 PASS ocean_global_ocean_QU240_PHC_RK4_restart_test
01:29 PASS ocean_global_ocean_QU240_PHC_RK4_decomp_test
01:29 PASS ocean_global_ocean_QU240_PHC_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_PHC_init
00:46 PASS ocean_global_ocean_QUwISC240_PHC_performance_test
00:14 FAIL ocean_ice_shelf_2d_5km_z-star_restart_test
00:24 FAIL ocean_ice_shelf_2d_5km_z-level_restart_test
00:16 FAIL ocean_ziso_20km_default
00:18 FAIL ocean_ziso_20km_with_frazil

All differences are 1e-12 or smaller. Considering that we changed order of operations for intel optimized, this is not surprising, and is acceptable for this PR.

@xylar
Copy link
Contributor

xylar commented Dec 8, 2022

@mark-petersen, thanks for this fix! I will test it in both compass and E3SM. I see you have the BFB flag but I wouldn't expect it to necessarily be BFB in E3SM, based on your compass testing. Let's see what happens.

@xylar
Copy link
Contributor

xylar commented Dec 8, 2022

Testing

compass

I tested with the compass pr suite on Chrysalis with Intel and OpenMPI (optimized). I used 90fb60d as a baseline. With the baseline, I saw validation failures with the baseline for:

ocean/baroclinic_channel/10km/decomp_test
  * step: initial_state
  * step: 4proc
  * step: 8proc
  test execution:      SUCCESS
  test validation:     FAIL
  see: case_outputs/ocean_baroclinic_channel_10km_decomp_test.log
  test runtime:        00:03
...
ocean/global_ocean/QU240/PHC/decomp_test
  * step: 4proc
  * step: 8proc
  test execution:      SUCCESS
  test validation:     FAIL
  see: case_outputs/ocean_global_ocean_QU240_PHC_decomp_test.log
  test runtime:        00:54

(This is interesting because ocean/global_ocean/QU240/PHC/decomp_test had passed in my previous testing with Intel and Intel-MPI, so the use of OpenMPI--what E3SM also uses on Chrysalis--may actually make this problem worse.)

With this branch, I see all tests passing execution and validation but all split-explicit tests are failing baseline comparison, as @mark-petersen saw and as we expected given the change of order of operations. As @mark-petersen found, I'm seeing differences on the order of 1e-12, so small but not quite machine precision.

E3SM

I am ran PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel, again using 90fb60d as a baseline. Results were bit-for-bit but I'll retest once @philipwjones's concerns are addressed.

@mark-petersen
Copy link
Contributor Author

After the last commit, I retested the nightly suite on chrysalis. Intel debug matches bfb between master and this PR, and passes all tests. Intel optimized fails bfb comparison to master, same as above.

@xylar
Copy link
Contributor

xylar commented Dec 12, 2022

I'll retest this tomorrow.

@xylar
Copy link
Contributor

xylar commented Dec 13, 2022

Both the pr test suite in compass (with Intel and OpenMPI in optimized mode) and the E3SM PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel passed as before. Teh pr test suite was non-BFB for the same tests as before (and using the same baseline), whereas PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel was BFB with the same baseline as before.

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on my testing and @mark-petersen's.

@xylar xylar requested review from philipwjones and removed request for philipwjones December 13, 2022 16:28
Copy link
Contributor

@philipwjones philipwjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve based on code inspection and others' testing. Previous comments were satisfactorily addressed.

Still getting tripped up by the fact that config_num_halos is not the actual halo width for edges, so this looks like we're out of bounds. But that's another discussion for omega...

jonbob added a commit that referenced this pull request Dec 13, 2022
…5356)

Move bottomDepthEdge calculation to single loop over all edges

After #5195 was merged, the MPAS-Ocean standalone test
ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and
8 partitions, but only for intel optimized. All compass nightly suite
tests passed for gnu debug, gnu optimized, intel debug.

This PR solves the problem by merging the computation of bottomDepthEdge
into a single edge loop. Previously it was split into two loops,
1:nEdgesOwned (with many other calculations) and another from
nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have
changed order-of-operations in these two loops for different partitions.

Fixes #5219
[BFB]
@jonbob
Copy link
Contributor

jonbob commented Dec 13, 2022

passes:

  • ERS_Ld5.T62_oQU120.CMPASO-NYF.chrysalis_intel
  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • PET_Ln9_PS.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-mach-pet
  • SMS_P12x2.ne4_oQU240.WCYCL1850NS.chrysalis_intel.allactive-mach_mods
  • SMS_D_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
  • PEM.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod
  • ERS_Ld5.T62_oQU120.CMPASO-NYF.compy_pgi

merged to next

@jonbob jonbob merged commit 0614e7b into E3SM-Project:master Dec 14, 2022
@jonbob
Copy link
Contributor

jonbob commented Dec 14, 2022

merged to master

xylar added a commit to xylar/compass that referenced this pull request Dec 15, 2022
This merge updates the E3SM-Project submodule from [569ed6b730](https://github.com/E3SM-Project/E3SM/tree/569ed6b730) to [0273cfad9d](https://github.com/E3SM-Project/E3SM/tree/0273cfad9d).

This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list):
- [ ]  (ocn) E3SM-Project/E3SM#5306
- [ ]  (fwk) E3SM-Project/E3SM#5303
- [ ]  (ocn) E3SM-Project/E3SM#5325
- [ ]  (fwk) E3SM-Project/E3SM#5337
- [ ]  (fwk) E3SM-Project/E3SM#5123
- [ ]  (fwk) E3SM-Project/E3SM#5281
- [ ]  (ocn) E3SM-Project/E3SM#5356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB bug fix PR mpas-ocean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ocean fails stand-alone decomp test, intel optimized
4 participants