Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convergence tasks for space and time only #236

Merged
merged 9 commits into from
Nov 25, 2024

Conversation

cbegeman
Copy link
Collaborator

This PR extends the convergence test framework to accommodate convergence in space, time or both. This PR also adds additional tests for space-only and time-only convergence for the cosine_bell, geostrophic, manufactured_solution, and inertial_gravity_wave groups. The convergence_time cases have not been optimized (in terms of resolution and time steps) nor have separate convergence thresholds been specified.

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • API documentation in the Developer's Guide (api.md) has any new or modified class, method and/or functions listed
  • Documentation has been built locally and changes look as expected
  • Testing comment in the PR documents testing used to verify the changes
  • New tests have been added to a test suite

@cbegeman cbegeman self-assigned this Oct 11, 2024
@cbegeman
Copy link
Collaborator Author

cbegeman commented Oct 11, 2024

Testing

I have tested all of the convergence tests on chrys with intel, impi.

@cbegeman cbegeman added enhancement New feature or request ocean Related to ocean tests or analysis labels Oct 11, 2024
@cbegeman cbegeman force-pushed the enhance-convergence-tasks branch 2 times, most recently from 9d40944 to 8c4d8f5 Compare October 16, 2024 21:13
@cbegeman cbegeman requested a review from xylar October 16, 2024 21:13
@cbegeman
Copy link
Collaborator Author

@sbrus89 If you want to review, feel free!

@mark-petersen mark-petersen self-requested a review October 28, 2024 15:29
@xylar
Copy link
Collaborator

xylar commented Oct 28, 2024

@cbegeman, sorry for dropping the ball on this one. I don't seem to get notified when I get marked as a reviewer so I just hadn't noticed.

Copy link
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach! I also really appreciate the amount of effort that went into it.

I'm seeing a crash in icos/geostrophic/convergence_time/forward/60km_960s. I think this is too big a time steps to be safe for RK4 at this resolution.

This leads to a more general comment. I think different options like refinement_factor are appropriate for space and time. First, there is not a clear reason that the number of forward steps in time necessarily needs to match the number in space (though it's fine if it does). But more importantly, one should presumably always be reducing dt from the base value, whereas it is common to increase the resolution from the base value. So for geostrphic, I could imagine:

icos_base_resolution = 60.0
icos_space_refinement_factors = 8., 4., 2., 1.
icos_time_refinement_factors = 1., 0.5, 0.25, 0.125
rk4_dt_per_km = 2.0

We might decide that we want to have a cheaper base resolution, we might feel that we could get away with a larger rk4_dt_per_km, or we might feel we could get away with larger values for icos_time_refinement_factors but it seems clear that 8 is too large.

A small side note for the future. In a future PR, we might want to add different convergence_thresh values for space, time and both. We expect 4th order convergence in time only (eventually) with RK4, right? We can cross that bridge when we come to it, though.

polaris/ocean/tasks/cosine_bell/__init__.py Outdated Show resolved Hide resolved
@xylar
Copy link
Collaborator

xylar commented Oct 28, 2024

Hmm, I'm not happy with my suggested icos_time_refinement_factors because it only applies in convergence_time, not in convergence_both. I'll have to think about that more.

@xylar
Copy link
Collaborator

xylar commented Oct 28, 2024

I suppose another approach would be to automatically renormalize *refinement_factor for convergence_space tests to be <= 1, so that 8, 4, 2, 1 and 4, 2, 1, 0.5 would both automatically becomes 1, 0.5, 0.25, 0.125 for space-only.

@cbegeman
Copy link
Collaborator Author

I did intend that eventually we would set time refinement factors differently from space refinement factors with different config options, but it would require a lot of testing for me to figure out what those should be for each case so I thought that could be a follow-on PR. Is that ok with you?

@cbegeman cbegeman force-pushed the enhance-convergence-tasks branch from 8c4d8f5 to 62eaaff Compare November 13, 2024 19:02
@cbegeman
Copy link
Collaborator Author

@xylar Ready for re-review when you have a chance

@mark-petersen
Copy link

Thank you @cbegeman, this is beautiful!

I Tested on chrysalis and chicoma. I compiled MPAS-Ocean standalone in a separate directory.

instructions:

cd $HOME/repos/polaris/pr
git reset --hard cbegeman/enhance-convergence-tasks
HEAD is now at 62eaaffe7 Separate refinement factors config options by refinement type
./configure_polaris_envs.py   --conda ${HOMEDIR}/miniforge3  --compiler gnu
source load_dev_polaris_0.4.0-alpha.2_chrysalis_gnu_openmpi.sh

polaris list|grep conver
  27: ocean/planar/inertial_gravity_wave/convergence_space
  28: ocean/planar/inertial_gravity_wave/convergence_time
  29: ocean/planar/inertial_gravity_wave/convergence_both
  58: ocean/planar/manufactured_solution/convergence_space
  59: ocean/planar/manufactured_solution/convergence_time
  60: ocean/planar/manufactured_solution/convergence_both
  63: ocean/spherical/icos/cosine_bell/convergence_space
  64: ocean/spherical/icos/cosine_bell/convergence_time
  65: ocean/spherical/icos/cosine_bell/convergence_both
  66: ocean/spherical/icos/cosine_bell/convergence_space/with_viz
  67: ocean/spherical/icos/cosine_bell/convergence_time/with_viz
  68: ocean/spherical/icos/cosine_bell/convergence_both/with_viz
  69: ocean/spherical/qu/cosine_bell/convergence_space
  70: ocean/spherical/qu/cosine_bell/convergence_time
  71: ocean/spherical/qu/cosine_bell/convergence_both
  72: ocean/spherical/qu/cosine_bell/convergence_space/with_viz
  73: ocean/spherical/qu/cosine_bell/convergence_time/with_viz
  74: ocean/spherical/qu/cosine_bell/convergence_both/with_viz
  75: ocean/spherical/icos/geostrophic/convergence_space
  76: ocean/spherical/icos/geostrophic/convergence_time
  77: ocean/spherical/icos/geostrophic/convergence_both
  78: ocean/spherical/icos/geostrophic/convergence_space/with_viz
  79: ocean/spherical/icos/geostrophic/convergence_time/with_viz
  80: ocean/spherical/icos/geostrophic/convergence_both/with_viz
  81: ocean/spherical/qu/geostrophic/convergence_space
  82: ocean/spherical/qu/geostrophic/convergence_time
  83: ocean/spherical/qu/geostrophic/convergence_both
  84: ocean/spherical/qu/geostrophic/convergence_space/with_viz
  85: ocean/spherical/qu/geostrophic/convergence_time/with_viz
  86: ocean/spherical/qu/geostrophic/convergence_both/with_viz


polaris setup -p ${HOME}/repos/E3SM/master/components/mpas-ocean -w $r/241121_polaris_conv -n   27  28  29  58  59  60  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86

# chrysalis:
srun -p debug -N 1 -t 1:00:00 --pty bash
# chicoma:
salloc -N 1 -t 2:0:0 --qos=debug --reservation=debug --account=t24_coastal_ocean

cd $HOME/repos/polaris/pr
source load_dev_polaris_0.4.0-alpha.2_chrysalis_gnu_openmpi.sh
cd $r/241121_polaris_conv
polaris serial

The init and forward steps work on all, but several fail in the analysis step, like this:

ocean/planar/manufactured_solution/convergence_time
  * step: init_50km
          already completed
  * step: forward_50km_150s
          already completed
  * step: forward_50km_75s
          already completed
  * step: forward_50km_38s
          already completed
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_planar_manufactured_solution_convergence_time.log
  task runtime:     0:00:01

This is due to convergence rate that is lower than the cut-off. For example,

tail -n 20 case_outputs/ocean_planar_manufactured_solution_convergence_time.log  
in /usr/projects/climate/mpeterse/repos/polaris/pr/polaris/ocean/convergence/analysis.py

Order of convergence for SSH: 1.25
Error: order of convergence for SSH
  1.25 < min tolerance 1.8
          execution:        ERROR
Exception raised while running the steps of the task
Traceback (most recent call last):
  File "/usr/projects/climate/mpeterse/repos/polaris/pr/polaris/run/serial.py", line 324, in _log_and_run_task
    baselines_passed = _run_task(task, available_resources)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/projects/climate/mpeterse/repos/polaris/pr/polaris/run/serial.py", line 403, in _run_task
    _run_step(task, step, task.new_step_log_file,
  File "/usr/projects/climate/mpeterse/repos/polaris/pr/polaris/run/serial.py", line 502, in _run_step
    step.run()
  File "/usr/projects/climate/mpeterse/repos/polaris/pr/polaris/ocean/convergence/analysis.py", line 153, in run
    self.plot_convergence(
  File "/usr/projects/climate/mpeterse/repos/polaris/pr/polaris/ocean/convergence/analysis.py", line 284, in plot_convergence
    raise ValueError('Convergence rate below minimum tolerance.')
ValueError: Convergence rate below minimum tolerance.

Is that expected? These are the four cases that fail:

grep 'Error: order of convergence' -B 1 case_outputs/*log
case_outputs/ocean_planar_inertial_gravity_wave_convergence_time.log-Order of convergence for SSH: 0.0
case_outputs/ocean_planar_inertial_gravity_wave_convergence_time.log:Error: order of convergence for SSH
--
case_outputs/ocean_planar_manufactured_solution_convergence_both.log-Order of convergence for SSH: 0.831
case_outputs/ocean_planar_manufactured_solution_convergence_both.log:Error: order of convergence for SSH
--
case_outputs/ocean_planar_manufactured_solution_convergence_space.log-Order of convergence for SSH: 0.023
case_outputs/ocean_planar_manufactured_solution_convergence_space.log:Error: order of convergence for SSH
--
case_outputs/ocean_planar_manufactured_solution_convergence_time.log-Order of convergence for SSH: 1.25
case_outputs/ocean_planar_manufactured_solution_convergence_time.log:Error: order of convergence for SSH

or is the correct convergence waiting on another PR? Thanks.

Here is the full output:

ocean/planar/inertial_gravity_wave/convergence_space
  * step: init_200km
          already completed
  * step: forward_200km_300s
          already completed
  * step: init_100km
          already completed
  * step: forward_100km_300s
          already completed
  * step: init_50km
          already completed
  * step: forward_50km_300s
          already completed
  * step: init_25km
          already completed
  * step: forward_25km_300s
          already completed
  * step: analysis
          already completed
  task execution:   SUCCESS
  task runtime:     0:00:00
ocean/planar/inertial_gravity_wave/convergence_time
  * step: init_100km
          already completed
  * step: forward_100km_300s
          already completed
  * step: forward_100km_150s
          already completed
  * step: forward_100km_75s
          already completed
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_planar_inertial_gravity_wave_convergence_time.log
  task runtime:     0:00:01
ocean/planar/inertial_gravity_wave/convergence_both
  * step: init_200km
          already completed
  * step: forward_200km_600s
          already completed
  * step: init_100km
          already completed
  * step: forward_100km_300s
          already completed
  * step: init_50km
          already completed
  * step: forward_50km_150s
          already completed
  * step: init_25km
          already completed
  * step: forward_25km_75s
          already completed
  * step: analysis
          already completed
  task execution:   SUCCESS
  task runtime:     0:00:00
ocean/planar/manufactured_solution/convergence_space
  * step: init_200km
          already completed
  * step: forward_200km_150s
          already completed
  * step: init_100km
          already completed
  * step: forward_100km_150s
          already completed
  * step: init_50km
          already completed
  * step: forward_50km_150s
          already completed
  * step: init_25km
          already completed
  * step: forward_25km_150s
          already completed
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_planar_manufactured_solution_convergence_space.log
  task runtime:     0:00:01
ocean/planar/manufactured_solution/convergence_time
  * step: init_50km
          already completed
  * step: forward_50km_150s
          already completed
  * step: forward_50km_75s
          already completed
  * step: forward_50km_38s
          already completed
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_planar_manufactured_solution_convergence_time.log
  task runtime:     0:00:01
ocean/planar/manufactured_solution/convergence_both
  * step: init_200km
          already completed
  * step: forward_200km_600s
          already completed
  * step: init_100km
          already completed
  * step: forward_100km_300s
          already completed
  * step: init_50km
          already completed
  * step: forward_50km_150s
          already completed
  * step: init_25km
          already completed
  * step: forward_25km_75s
          execution:        SUCCESS
          runtime:          0:00:42
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_planar_manufactured_solution_convergence_both.log
  task runtime:     0:00:43
ocean/spherical/icos/cosine_bell/convergence_space
  * step: icos_base_mesh_480km
          execution:        SUCCESS
          runtime:          0:00:28
  * step: icos_init_480km
          execution:        SUCCESS
          runtime:          0:00:00
  * step: icos_forward_480km_180s
ls          execution:        SUCCESS
          runtime:          0:01:47
  * step: icos_base_mesh_240km
          execution:        SUCCESS
          runtime:          0:00:33
  * step: icos_init_240km
          execution:        SUCCESS
          runtime:          0:00:00
  * step: icos_forward_240km_180s
          execution:        SUCCESS
          runtime:          0:01:49
  * step: icos_base_mesh_120km
          execution:        SUCCESS
          runtime:          0:00:40
  * step: icos_init_120km
          execution:        SUCCESS
          runtime:          0:00:00
  * step: icos_forward_120km_180s
          execution:        SUCCESS
          runtime:          0:04:49
  * step: icos_base_mesh_60km
          execution:        SUCCESS
          runtime:          0:01:22
  * step: icos_init_60km
          execution:        SUCCESS
          runtime:          0:00:00
  * step: icos_forward_60km_180s
          execution:        SUCCESS
          runtime:          0:15:22
  * step: analysis
          execution:        SUCCESS
          runtime:          0:00:02
  task execution:   SUCCESS
  task runtime:     0:26:53
ocean/spherical/icos/cosine_bell/convergence_time
  * step: icos_base_mesh_60km
          already completed
  * step: icos_init_60km
          already completed
  * step: icos_forward_60km_180s
          already completed
  * step: icos_forward_60km_90s (still waiting on this one)

and a sample plot from one of the failed tests:

pwd
/lcrc/group/e3sm/ac.mpetersen/scratch/runs/241121_polaris_conv
cd ocean/planar/manufactured_solution/convergence_time/analysis/
ls -lh *png
-rw-r--r-- 1 ac.mpetersen E3SM 32K Nov 21 14:09 convergence_ssh.png

convergence_ssh

Something does look wrong with the case ocean/planar/inertial_gravity_wave/convergence_time/analysis/, which has constant error. Perhaps the time step is not actually changing?

convergence_ssh

Also, my test timed out on icos_forward_60km_90s on chrysalis. Perhaps I just need to run with more nodes for that one to finish. It did finish on chicoma.

@mark-petersen
Copy link

This one also fails:

ocean/spherical/icos/geostrophic/convergence_time
details:

ocean/spherical/icos/geostrophic/convergence_time
  * step: icos_base_mesh_60km
          already completed
  * step: icos_init_60km
          already completed
  * step: icos_forward_60km_120s
          already completed
  * step: icos_forward_60km_60s
          execution:        SUCCESS
          runtime:          0:01:45
  * step: icos_forward_60km_30s
          execution:        SUCCESS
          runtime:          0:03:11
  * step: analysis
          execution:        ERROR
  task execution:   ERROR
  see: case_outputs/ocean_spherical_icos_geostrophic_convergence_time.log

tail -n 20 case_outputs/ocean_spherical_icos_geostrophic_convergence_time.log
  in /gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/ocean/convergence/analysis.py

Order of convergence for water-column thickness: -0.073
Error: order of convergence for water-column thickness
  -0.073 < min tolerance 0.4
          execution:        ERROR
Exception raised while running the steps of the task
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/run/serial.py", line 324, in _log_and_run_task
    baselines_passed = _run_task(task, available_resources)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/run/serial.py", line 403, in _run_task
    _run_step(task, step, task.new_step_log_file,
  File "/gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/run/serial.py", line 502, in _run_step
    step.run()
  File "/gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/ocean/convergence/analysis.py", line 153, in run
    self.plot_convergence(
  File "/gpfs/fs1/home/ac.mpetersen/repos/polaris/pr/polaris/ocean/convergence/analysis.py", line 284, in plot_convergence
    raise ValueError('Convergence rate below minimum tolerance.')
ValueError: Convergence rate below minimum tolerance.

convergence_h

Note, ocean/spherical/icos/geostrophic/convergence_space works beautifully:
convergence_h

convergence_normalVelocity

@cbegeman
Copy link
Collaborator Author

@mark-petersen Thank you for your testing!

The manufactured solution tests will fail unless you use @sbrus89's RK4 time fix branch. Let me check on the IGW. I didn't pay too much attention to whether the time steps were chosen appropriately so maybe the spatial errors are dominating.

@mark-petersen
Copy link

Also ocean/spherical/qu/geostrophic/convergence_time, same as above. These are likely all the same issue.

Order of convergence for water-column thickness: -0.116
Error: order of convergence for water-column thickness
  -0.116 < min tolerance 0.4

convergence_h

@cbegeman
Copy link
Collaborator Author

I did intend that eventually we would set time refinement factors differently from space refinement factors with different config options, but it would require a lot of testing for me to figure out what those should be for each case so I thought that could be a follow-on PR. Is that ok with you?

As I noted above, I didn't have time to optimally choose all of the time steps so I didn't add time convergence tests to any suites.

@mark-petersen
Copy link

Thanks for pointing that out. If they converge for the 'both' cases, that is the standard test and enough for me. It is nice to have the machinery to test in space and time separately, so thanks for setting it up.

@cbegeman
Copy link
Collaborator Author

@mark-petersen I just took a look at Sid's paper to see what he chose for the IGW time steps. He has it written in number of time steps, but I'm having trouble finding the simulation duration to back out the time step.

@cbegeman
Copy link
Collaborator Author

Thanks for reviewing, @mark-petersen!

Copy link
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbegeman, this looks great! Based on @mark-petersen's testing, I agree that we can dig into the individual convergence tests in space or time as we feel the need.

I did see some changes that I think are needed in the docs to match recent changes in the code. And there's one unneeded import that the linter is complaining about. All this should be quick to fix, so I'm approving based on the assumption that those small things will be addressed.

It also might be good to rebase onto main and at least make sure CI passes after that.

polaris/ocean/tasks/cosine_bell/__init__.py Outdated Show resolved Hide resolved
Comment on lines 242 to +250
# a list of icosahedral mesh resolutions (km) to test
icos_resolutions = 60, 120, 240, 480
icos_refinement_factors = 8., 4., 2., 1.

# The base resolution for the quasi-uniform mesh to which the refinement
# factors are applied
qu_base_resolution = 120.

# a list of quasi-uniform mesh resolutions (km) to test
qu_resolutions = 60, 90, 120, 150, 180, 210, 240
qu_refinement_factors = 0.5, 0.75, 1., 1.25, 1.5, 1.75, 2.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe these config options need to be updated to match recent code changes.

Comment on lines +268 to +270
# refinement factors for a planar mesh applied to either space or time
# refinement factors for a spherical mesh given in section spherical_convergence
refinement_factors = 4., 2., 1., 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has now been updated to separate lists for time and space in the code.

Comment on lines 1 to 2
import xarray as xr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import xarray as xr

According to the linter, looks like this isn't actually used.

@cbegeman cbegeman force-pushed the enhance-convergence-tasks branch from 62eaaff to 9fe597a Compare November 25, 2024 16:36
@cbegeman cbegeman merged commit a44d4b8 into E3SM-Project:main Nov 25, 2024
5 checks passed
@cbegeman cbegeman deleted the enhance-convergence-tasks branch November 25, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ocean Related to ocean tests or analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants