Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix split-explicit barotropic velocity accumulator #6041

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

cbegeman
Copy link
Contributor

@cbegeman cbegeman commented Oct 31, 2023

This PR includes a bugfix for the loop limits on the barotropic velocity (normalBarotropicVelocityNew) when its values are accumulated over barotropic subcycles. The loop should be over nEdgesOwned for consistency with the loop which divides the barotropic velocity by the number of subcycles.

Fixes #6040

[BFB] for all current E3SM tests, changes to a non-default option

@cbegeman cbegeman added mpas-ocean bug fix PR NCC Larger-then-roundoff diffs but not believed climate changing labels Oct 31, 2023
@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 1, 2023

We need to run tests to evaluate if this fix is in fact non-climate changing. Labels will be updated when we know.

@mark-petersen
Copy link
Contributor

@cbegeman thanks for this fix. I agree that the extents on the two loops you identify in #6040 should be the same. Also, it is best to use the smaller extents when possible (Owned rather than All).

What I don't understand is why the original problem doesn't make the partition test fail. Perhaps it is wetting and drying that reveals the problem, and with WD off it never caused trouble because the halo was not used before a halo update.

I'm testing this now.

@mark-petersen
Copy link
Contributor

Actually, this fix doesn't make sense to me. If you are using the halo edges with wetting/drying for normalBarotropicVelocityNew, then the solution is to make the second loop at #6040 go to nEdgesAll. If you are not using that variable on the halo, then the original extents to nEdgesAll on the first loop wastes computation, but shouldn't cause a problem.

The fact that the incorrect value on the halo causes a problem won't be solved by the current fix, because the value on the halo is still incorrect.

@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 1, 2023

@mark-petersen Thanks for taking a look and for those thoughts. What doesn't make sense to me is that I would think that if those values on the halo were not being used, then the solution would be identical whether the extent of the first loop was nEdgesAll or nEdgesOwned, but I do find that the solution is different (between this branch and master).

@mark-petersen
Copy link
Contributor

@cbegeman, could you do a partition test with W/D on before and after this change? The partition test will tell you if there is any halo mismatch. If the compass partition tests are not right to produce W/D behavior, you can just do your own with 4 vs 8 or whatever with a W/D test. Thanks.

I still suspect that changing the second loop to nEdgesAll is the correct fix, but we should always make the halos as tight as possible to save flops.

@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 1, 2023

@mark-petersen The baroclinic_channel/10km/decomp_test passes for master and this branch, but the solution differs between the two.

I've created another branch where I change the second loop to nEdgesAll and the solution is different from master and the same as this branch (for baroclinic_channel default, decomp, threads).

All of these tests are without wetting and drying. I don't think it's necessary to bring in that added level of complexity.

@mark-petersen
Copy link
Contributor

@cbegeman I think you changed the wrong loop. You meant to change this one, right? Not line 861.

do iEdge = 1, nEdgesAll
normalBarotropicVelocityNew(iEdge) = &
normalBarotropicVelocityNew(iEdge) + &
normalBarotropicVelocitySubcycleNew(iEdge)
end do ! iEdge

@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 1, 2023

@mark-petersen Thanks for catching that. Yes, I had the loop lines right in the issue. I will push the change now and retest.

@cbegeman cbegeman force-pushed the ocn-fixup-btr-accumulator branch from 2fa6d8c to ff06b0d Compare November 1, 2023 21:35
@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 1, 2023

@mark-petersen For me, the comments above still hold with the updated branch. It would be good to have you confirm though.

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the nightly suite with gnu debug on chrysalis, I was able to confirm @cbegeman's results. With this PR, all decomp tests pass, and this is bfb identical between this PR and if both loops are changed to nEdgesAll in lines 1484 and 1512. The bfb comparison show that there is no harm done to the solution by tightening the halo computation to nEdgesOwned, and there is a small benefit to computing fewer flops.

@cbegeman
Copy link
Contributor Author

cbegeman commented Nov 2, 2023

@mark-petersen Thank you so much for your review and testing!

@jonbob This is now ready for your testing.

@mark-petersen mark-petersen added the non-BFB PR makes roundoff changes to answers. label Nov 2, 2023
Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving by code inspection and based on @cbegeman and @mark-petersen's testing.

@rljacob
Copy link
Member

rljacob commented Nov 17, 2023

@vanroekel should this be in v3?

@xylar
Copy link
Contributor

xylar commented Nov 17, 2023

I certainly think it should be.

@mark-petersen
Copy link
Contributor

Yes, this should be in V3.

@vanroekel
Copy link
Contributor

vanroekel commented Nov 20, 2023

Sorry for not responding sooner. @cbegeman and @mark-petersen could you summarize what the expected results here are? The PR is labeled non-BFB, but @mark-petersen approval comment seem to suggest there is no difference in the loop limit? I think it is something that needs to be included, but would appreciate a bit of clarity to alleviate my confusion.

Related, any expectation on the climate impact? How far from nonBFB are the changes?

@cbegeman related question does AB2 not have this issue? (Never Mind - I see that the change is there already)

Sorry for all the questions.

But from what I understand from this discussion and issue this is a bug that should be fixed for v3.

@cbegeman
Copy link
Contributor Author

@vanroekel No problem. Here's an overview of the results of the standalone testing:

Loop 1 is the accumulator loop

do iEdge = 1, nEdgesAll
normalBarotropicVelocityNew(iEdge) = &
normalBarotropicVelocityNew(iEdge) + &
normalBarotropicVelocitySubcycleNew(iEdge)
end do ! iEdge

Loop 2 is the division loop

do iEdge = 1, nEdgesOwned
barotropicThicknessFlux(iEdge) = &
barotropicThicknessFlux(iEdge) &
/(nBtrSubcycles*config_btr_subcycle_loop_factor)
normalBarotropicVelocityNew(iEdge) = &
normalBarotropicVelocityNew(iEdge) &
/ (nBtrSubcycles*config_btr_subcycle_loop_factor + 1)
end do

A. Master: Loop 1 is nEdgesAll, loop 2 is nEdgesOwned
B. This branch: Loop 1 is nEdgesOwned, loop 2 is nEdgesOwned
C. Alternate: Loop 1 is nEdgesAll, loop 2 is nEdgesAll

A is non-BFB with B and C.
B and C are BFB.

Here we go with B, the least computations. AB2 goes with C, the more conservative approach https://github.com/E3SM-Project/E3SM/pull/5989/files#r1380567559.

@vanroekel
Copy link
Contributor

vanroekel commented Nov 20, 2023

Very helpful, thanks @cbegeman! I think I understand things now. @rljacob this is a definite bug with a fix and it should be in v3.

@jonbob
Copy link
Contributor

jonbob commented Dec 1, 2023

@cbegeman -- I ran a 10-year B-case baseline, as well as a comparison test with this PR. They are apparently BFB. Does this surprise you? I was running with the ECwISC30to60E3r2 mesh and had

 config_time_integrator = 'split_explicit'

in user_nl_mpaso

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 1, 2023

@jonbob, Thanks for running that test. I think it's possible that these changes are compiler specific. @mark-petersen and I were actually surprised that the changes were non-BFB in any configuration, so it's totally fine that they are BFB.

@jonbob jonbob added BFB PR leaves answers BFB and removed non-BFB PR makes roundoff changes to answers. labels Dec 4, 2023
jonbob added a commit that referenced this pull request Dec 4, 2023
Fix split-explicit barotropic velocity accumulator

This PR includes a bugfix for the loop limits on the barotropic velocity
(normalBarotropicVelocityNew) when its values are accumulated over
barotropic subcycles. The loop should be over nEdgesOwned for
consistency with the loop which divides the barotropic velocity by the
number of subcycles.

Fixes #6040

[BFB] for all current E3SM tests, changes to a non-default option
@jonbob
Copy link
Contributor

jonbob commented Dec 4, 2023

passes:

  • PEM_Ln9.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel

merged to next

@jonbob jonbob merged commit 795cc1c into E3SM-Project:master Dec 5, 2023
2 checks passed
@jonbob
Copy link
Contributor

jonbob commented Dec 5, 2023

merged to master

@cbegeman
Copy link
Contributor Author

cbegeman commented Dec 5, 2023

Thanks @jonbob!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB bug fix PR mpas-ocean NCC Larger-then-roundoff diffs but not believed climate changing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in split-explicit barotropic velocities
6 participants