-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move bottomDepthEdge calculation to single loop over all edges #5356
Move bottomDepthEdge calculation to single loop over all edges #5356
Conversation
Testing on cori, the compass nightly suite are bfb identical for all tests using gnu debug, gnu optimized, and intel debug. For intel optimized the following tests fail bfb match with master for this PR (90fb60d to 582899e):
All differences are 1e-12 or smaller. Considering that we changed order of operations for intel optimized, this is not surprising, and is acceptable for this PR. |
@mark-petersen, thanks for this fix! I will test it in both compass and E3SM. I see you have the BFB flag but I wouldn't expect it to necessarily be BFB in E3SM, based on your compass testing. Let's see what happens. |
TestingcompassI tested with the compass
(This is interesting because With this branch, I see all tests passing execution and validation but all split-explicit tests are failing baseline comparison, as @mark-petersen saw and as we expected given the change of order of operations. As @mark-petersen found, I'm seeing differences on the order of 1e-12, so small but not quite machine precision. E3SMI am ran |
components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F
Outdated
Show resolved
Hide resolved
After the last commit, I retested the nightly suite on chrysalis. Intel debug matches bfb between master and this PR, and passes all tests. Intel optimized fails bfb comparison to master, same as above. |
I'll retest this tomorrow. |
Both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on my testing and @mark-petersen's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve based on code inspection and others' testing. Previous comments were satisfactorily addressed.
Still getting tripped up by the fact that config_num_halos is not the actual halo width for edges, so this looks like we're out of bounds. But that's another discussion for omega...
…5356) Move bottomDepthEdge calculation to single loop over all edges After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug. This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions. Fixes #5219 [BFB]
passes:
merged to next |
merged to master |
This merge updates the E3SM-Project submodule from [569ed6b730](https://github.com/E3SM-Project/E3SM/tree/569ed6b730) to [0273cfad9d](https://github.com/E3SM-Project/E3SM/tree/0273cfad9d). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/E3SM#5306 - [ ] (fwk) E3SM-Project/E3SM#5303 - [ ] (ocn) E3SM-Project/E3SM#5325 - [ ] (fwk) E3SM-Project/E3SM#5337 - [ ] (fwk) E3SM-Project/E3SM#5123 - [ ] (fwk) E3SM-Project/E3SM#5281 - [ ] (ocn) E3SM-Project/E3SM#5356
After #5195 was merged, the MPAS-Ocean standalone test
ocean/baroclinic_channel/10km/decomp_test
failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug.This PR solves the problem by merging the computation of
bottomDepthEdge
into a single edge loop. Previously it was split into two loops,1:nEdgesOwned
(with many other calculations) and another fromnEdgesOwned+1:nEdgesArray(4)
. The intel optimized compiler must have changed order-of-operations in these two loops for different partitions.Fixes #5219
[BFB]