-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ocean fails stand-alone decomp test, intel optimized #5219
Comments
Earlier this week, I thought it failed
with intel optimized, but I don't see that on badger with 4dcc8db now - maybe I'm mixing it up though. |
tested the very first SMEP merge: bb84429 Merge branch 'vanroekel/ocean/add-submesoscale-eddies' (PR #5099)
with intel optimized on badger. This is confusing. |
I'm trying to use bisection to find the cause of this and I'm just seeing hanging when I try to test on c63cce2. This may have more to do with a bad node on Anvil or something but it's certainly not helping me debug... |
@mark-petersen, I agree that #5099 is responsible and that this is probably already fixed in #5216. I'll make sure. |
Sorry, I wasn't clear in my mind. We're looking for a decomposition problem, not a threading problem. Much trickier in some ways! |
I believe this was introduced by #5183 and not by #5099. At least that is what I'm seeing in testing on Chrysalis with Intel and Intel-MPI. I'm seeing test execution passing for #5170 (the previous ocean-related commit merge) but failing for #5183. No PRs were merged between these 2 so it seems like #5183 is likely responsible, though why is not at all clear at this point. |
After rerunning |
@mark-petersen and @dengwirda, I now believe this issue was introduced by #5195. While I ran into sporadic execution failures of I'm not at all confident about this but the most likely culprit to my eyes is this new loop: E3SM/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F Lines 798 to 813 in 21ffb4d
It seems like maybe the OpenMP directives may not cover all the variables they need to? Some were fixed in #5226 but maybe some are still missing? I'm still investigating. |
Another possibility is this line:
It could be that 4 is out of range for nEdgesArray . If so, this would not be the only place that it is indexed out of bounds, the split implicit solver also indexes to config_num_halos + 1 , which defaults to 4. I couldn't find any other code that indexes to this halo so it could be that it isn't guaranteed to exist and doesn't exist for some reason (e.g. small mesh size?) in the baroclinic channel test case.
|
I'm going to quickly try rerunning that test case with an index of |
Still fails with |
Well, I spent some time on this and did not figure it out. But it appears that
then I get a decomp test match for the baroclinic channel. This is not a solution, of course, because it overwrites the actual values in the array. But I tried some other things, like an extra halo update and rounding that array, but those didn't fix it. |
I finally found it. The computation of
Whew! On a side note, The
which includes all edges within the halo, but not the outside edges of the last halo layer. |
Very nice detective work, @mark-petersen! I agree that we should not hard-code the halo size so I'm very happy with your recommended solution. |
…5356) Move bottomDepthEdge calculation to single loop over all edges After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug. This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions. Fixes #5219 [BFB]
MPAS-Ocean nightly on master 4dcc8db fails
with the intel optimized compiler and OpenMP. Differences between 4 processor and 8 processor runs are max 1e-13.
The text was updated successfully, but these errors were encountered: