Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LTS option to dam_break and parabolic_bowl test cases #738

Merged
merged 10 commits into from
Jan 30, 2024

Conversation

gcapodag
Copy link
Collaborator

This PR enhances the dam_break and parabolic_bowl test cases with the option to run using the local time-stepping (LTS) time integrator. Note that LTS only works in a single layer setting. Please also see the companion PR E3SM-Project/E3SM#6074

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • API documentation in the Developer's Guide (api.rst) has any new or modified class, method and/or functions listed
  • Documentation has been built locally and changes look as expected
  • The E3SM-Project submodule has been updated with relevant E3SM changes
  • The MALI-Dev submodule has been updated with relevant MALI changes
  • Document (in a comment titled Testing in this PR) any testing that was used to verify the changes
  • New tests have been added to a test suite

@gcapodag
Copy link
Collaborator Author

This is dam_break with LTS, 120cm and default.

dam_break_comparison

@gcapodag
Copy link
Collaborator Author

This is dam_break with RK4, 120cm and default.

dam_break_comparison

@gcapodag
Copy link
Collaborator Author

gcapodag commented Nov 29, 2023

While LTS seems to work better than RK4 with the coarser mesh (120 cm) it does not show the same improved behavior at finer meshes (40 cm) as the pictures below will show. It seems to be a thing though for the W/D scheme to not improve consistently with finer meshes as observed, for instance, here: #670 and here: #716

@gcapodag
Copy link
Collaborator Author

This is dam_break with LTS, 40cm and default.

dam_break_comparison

@gcapodag
Copy link
Collaborator Author

This is dam_break with RK4, 40cm and default.
dam_break_comparison

@gcapodag
Copy link
Collaborator Author

I am not showing the ramp case for dam_break and LTS because the results are qualitatively the same.

@gcapodag
Copy link
Collaborator Author

gcapodag commented Nov 29, 2023

An interesting thing for the dam_break test is that RK4 shows two wave fronts whereas LTS shows only one. I looked into the paper from which the dam_break test was taken: https://www.sciencedirect.com/science/article/abs/pii/S0098300413001362?via%3Dihub and it looks like after 2 seconds the wave has already left the domain between x in [0,4] and y in [0,2]. That also happens with LTS, but it is not clear if it happens with RK4 as well. See pictures below.

@gcapodag
Copy link
Collaborator Author

gcapodag commented Nov 29, 2023

Notice how the wave has moved away from the dark grey region in the picture from https://www.sciencedirect.com/science/article/abs/pii/S0098300413001362?via%3Dihub
Screen Shot 2023-11-17 at 11 21 59 AM

This is LTS, and there is no wave front anymore in [10,13]x[11,13]:
Screen Shot 2023-11-17 at 11 20 41 AM

A second wave front seems to have barely just left the region [10,13]x[11,13] with RK4, I cannot tell for sure if the second wave front has left that region or not:
Screen Shot 2023-11-17 at 11 20 28 AM

@gcapodag
Copy link
Collaborator Author

This is the solution at the final time for parabolic_bowl with LTS and ramp

solution_360

@gcapodag
Copy link
Collaborator Author

This is also LTS with parabolic_bowl and ramp

points

@gcapodag
Copy link
Collaborator Author

gcapodag commented Nov 29, 2023

These are the errors for LTS with parabolic_bowl for both no ramp and ramp (compare to #670). The reference curve seems to be inverted with respect to the others but it should not be due to this PR and @sbrus89 is already looking into this

error

for icell in range(1, len(lines)):
if (lts_rgn[icell - 1] == 1 or lts_rgn[icell - 1] == 5): # fine

# newf+= "0 1 0 " + lines[icell].strip() + "\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove these lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just that they can be deleted rather than commented out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I do not anticipate using them in the future, I would rather have them there as a record in case we will need to use that old METIS partitioning for some reason. If I remove them then I will not remember how they were :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Why don't you leave a comment that mentions that they refer to the old METIS partitioning so others know why they are there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, done!

Copy link
Collaborator

@cbegeman cbegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcapodag I haven't yet tried running with this branch, but I hope to get to it along with the E3SM PR by the end of the week. Just saw a few things that you might want to clean up when you have a chance.

compass/ocean/tests/dam_break/lts/lts_regions.py Outdated Show resolved Hide resolved
compass/ocean/tests/dam_break/namelist.forward Outdated Show resolved Hide resolved
@@ -1,92 +0,0 @@
from mpas_tools.io import write_netcdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to remove this file? Just looking at the github page it seems to be missing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry, let me restore it

@gcapodag
Copy link
Collaborator Author

@cbegeman thanks for your comments, sorry it took me a minute to address them. I just did. When you get a chance, please test this out using the companion LTS/WD PR in E3SM. Also, do you have any suggestions on how to fix the failures for CI? Thanks.

@mark-petersen
Copy link
Collaborator

On the head of this PR, I try:

./conda/configure_compass_env.py --conda ~/miconda3 --compiler intel
source load_dev_compass_1.2.0-alpha.6_chrysalis_intel_openmpi.sh
compass list

and I get the error

    raise FileNotFoundError(f'Config file does not exist: {filename}')
FileNotFoundError: Config file does not exist: 
/gpfs/fs1/home/ac.mpetersen/repos/compass/pr/compass/ocean/tests/parabolic_bowl/parabolic_bowl.cfg

@gcapodag I suspect that the file

compass/ocean/tests/parabolic_bowl/parabolic_bowl.cfg

exists locally for you but was not added to the repo. git status will show you all the uncommitted files. I've made that mistake too!

@gcapodag
Copy link
Collaborator Author

gcapodag commented Jan 4, 2024

Thanks @mark-petersen. I created this PR copying over files from another branch where there are extra files that should not be included here. It looks like that parabolic_bowl.cfg file did not make it with the other ones when I copied them. I added it now and pushed.

@cbegeman
Copy link
Collaborator

@gcapodag All of the dam break tests ran fine for me on Chrys with intel, impi. I checked the viz and LTS looks visually identical to RK4 time integrator.

@cbegeman
Copy link
Collaborator

cbegeman commented Jan 11, 2024

@gcapodag I'm getting time-outs on my LTS parabolic bowl jobs with the automatically-generated job script (1 node, 2 hours) on chyrsalis (intel, impi) with the 5km-resolution forward step. Do you have a sense for whether we should increase the duration or node-count? We should set it to a safe value before merging this PR.

Update: I got a timeout at 2h on 1 node for just the ramp_lts/forward_5km step. It reached 0001-01-02_14:07:12.

@gcapodag
Copy link
Collaborator Author

gcapodag commented Jan 11, 2024

Thanks @cbegeman . Unfortunately I do not have access to Chrysalis and can only test on Perlmutter. On Perlmutter I ran the test ocean/parabolic_bowl/standard/ramp_lts first doing salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu and then compass run and got this (the build was with DEBUG=FALSE):

test execution: SUCCESS
test runtime: 31:53
Test Runtimes:
31:53 PASS ocean_parabolic_bowl_standard_ramp_lts
Total runtime 31:59
PASS: All passed successfully!

All the resolutions were run one after the other and the times above have been outputted right after the viz step had completed, which happens after the 5km forward step is done.
I do not have a sense of what should be on Chrysalis since it looks like a 1 hr allocation is enough on Perlmutter. Considering that the run duration is 3 days and it only did 1 day and 14 minutes for the 5km case maybe we could start by requesting 2 nodes and if that is not enough also increase the time allocation. What do you think?

@cbegeman
Copy link
Collaborator

@gcapodag I got a time out with debug on at Doing timestep 0001-01-02_17:12:44 with 2 nodes and 2 hours of run time. This suggests to me that the code is hanging on chrysalis. It's a shame you don't have access to chrysalis. @mark-petersen Do you have thoughts on the best way to proceed?

@gcapodag
Copy link
Collaborator Author

@cbegeman can you please try again on Chrysalis to see if it still hangs? I realized that I had not modified the forward namelist to set ssh_gradient as the pressure gradient type, which is the only one currently supported by LTS. Doing that, the plots at the different stations for the dam break test now come out the same as RK4.

@cbegeman
Copy link
Collaborator

@gcapodag Great! That namelist fix did the trick for me on Chyrsalis. Approving now.

@gcapodag
Copy link
Collaborator Author

Awesome! Thank you so much for working with me on this @cbegeman , you've been a huge help!

@xylar
Copy link
Collaborator

xylar commented Jan 22, 2024

@cbegeman, I'm assigning this to you to merge once it's been reviewed. Does that work for you?

@cbegeman
Copy link
Collaborator

Yes, happy to take care of the merge. @mark-petersen and @sbrus89 It appears that we are still waiting on your reviews.

Copy link
Collaborator

@sbrus89 sbrus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran these cases successfully on Perlmutter with debug flags and everything ran successfully and looks good! My only comment is that maybe the documentation for these cases should be updated to reflect the addition of LTS configs?

@gcapodag
Copy link
Collaborator Author

Thank you @sbrus89 ! That is a good observation. I do not have a preference so I am going to defer to you and the others. One thing I would like to mention is that these tests serve as a check that W/D works with LTS and do not leverage any "LTS power" in terms of running faster, from this point of view, having LTS on is just a matter of using a different time integrator.

@xylar
Copy link
Collaborator

xylar commented Jan 23, 2024

@gcapodag, it seems worth creating an lts.txt suite in:
https://github.com/MPAS-Dev/compass/tree/main/compass/ocean/suites
and adding these test cases to it. It might make sense to add the tests to the wetdry.txt suite as well but that's up to you, @cbegeman and @sbrus89 to decide.

I would still add something about LTS support to the docs for these tests wouldn't hurt. You can say that it's just for regression testing and doesn't provide a performance improvement.

@gcapodag
Copy link
Collaborator Author

Sounds good everyone. I just created an lts.txt suite and also added the lts tests to the wetdry.txt suite. I noticed that the parabolic bowl tests were not part of the wetdry.txt suite so I added those in as well, please let me know if not having them there was intentional and I will go ahead and remove them. I also updated the documentation with a little blurb on lts. Thanks!

Copy link
Collaborator

@sbrus89 sbrus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcapodag, this looks great to me. Nice job on getting LTS working with wetting/drying!

@cbegeman
Copy link
Collaborator

These are the errors for LTS with parabolic_bowl for both no ramp and ramp (compare to #670). The reference curve seems to be inverted with respect to the others but it should not be due to this PR and @sbrus89 is already looking into this

@sbrus89 @gcapodag Can one of you post an issue for this when you have a chance so we don't lose track of it? Thanks!

@gcapodag
Copy link
Collaborator Author

@sbrus89 did you see that issue with the plot in your latest tests or is it only on my end?

@sbrus89
Copy link
Collaborator

sbrus89 commented Jan 26, 2024

@gcapodag, I saw it on my end too, but haven't had a chance to fix it.

@cbegeman, I'll make an issue.

@mark-petersen
Copy link
Collaborator

Tested by rebasing this pr on the head of main, then on chicoma

compass list | grep dam_break
 107: ocean/dam_break/40cm/default_lts
 108: ocean/dam_break/40cm/ramp_lts
 109: ocean/dam_break/40cm/default
 110: ocean/dam_break/40cm/ramp
 111: ocean/dam_break/120cm/default_lts
 112: ocean/dam_break/120cm/ramp_lts
 113: ocean/dam_break/120cm/default
 114: ocean/dam_break/120cm/ramp
compass setup -p ~/repos/E3SM/wettingdrying_lts_gcapodag/components/mpas-ocean -w $n/dambreak_wd_lts -n 111 112 113 114

and using the head of E3SM-Project/E3SM#6074 compiled with gnu optimized. This proceeds through the steps, and I can see the same images as above. Beautiful!

dam_break_comparison

I then tested:

compass list | grep bowl
 368: ocean/parabolic_bowl/standard/ramp_lts
 369: ocean/parabolic_bowl/standard/ramp
 370: ocean/parabolic_bowl/standard/noramp_lts
 371: ocean/parabolic_bowl/standard/noramp
compass setup -p ~/repos/E3SM/wettingdrying_lts_gcapodag/components/mpas-ocean -w $n/bowl_lts -n 368 369 370 371

and it also runs fine. Thanks!

Copy link
Collaborator

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcapodag thank you for the addition of these insightful test cases to show that wetting and drying work with LTS. This is a great example of coordinated code and testing PRs. The documentation *.rst pages also look great.

@mark-petersen
Copy link
Collaborator

Also tested both dam_break and parabolic_bowl on chrysalis without any trouble.

@gcapodag
Copy link
Collaborator Author

Great, thanks @mark-petersen !

@cbegeman cbegeman merged commit 633a562 into MPAS-Dev:main Jan 30, 2024
4 checks passed
@cbegeman
Copy link
Collaborator

Merged! Thanks for all your work on this @gcapodag!

@cbegeman
Copy link
Collaborator

cbegeman commented Jan 30, 2024

Oh, I see this should have had an "E3SM PR required" label. I think since these tests are not included in any regular test suites it will be ok until the E3SM PR is merged, but I can revert if need-be.

@gcapodag
Copy link
Collaborator Author

Thanks @cbegeman ! I see that the E3SM PR received all the necessary approvals yesterday so it should be merged any day now I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants