Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check CICE4 restart file dates #539

Merged
merged 15 commits into from
Dec 13, 2024

Conversation

blimlim
Copy link
Contributor

@blimlim blimlim commented Dec 3, 2024

Note: I've started this pull request as a draft as it currently has no unit tests. However, I had quite a bit of trouble working out the best way to structure these changes (which drivers to put them in etc), and hence would appreciate any comments/feedback/ideas/suggestions on the structure while I'm working on the unit tests.

This draft PR addresses #538, introducing two guards against CICE4 using a restart file whose date clashes with the date in the restart calendar file. This should prevent:

  1. Payu from picking up the wrong restart file
  2. Users from mistakenly changing the name of an iced.YYYYMMDD restart file.

These are both handled by the cice4_make_restart_pointer() function. This checks that an iced.YYYYMMDD file exists with a name matching the start date calculated from the calendar restart file (restart_date.nml). It then checks that the time in the iced file's header matches the previous runtime calculated from the calendar restart file, before writing the new pointer file.

This function is called during the setup stage of the access driver.

            if model.model_type == 'cice':
                # Set up and check the cice restart files.
                cice4_make_restart_pointer(model,
                                           run_start_date,
                                           previous_runtime)

I think it would have made sense to place these steps in the cice driver, and include necessary overrides in the cice5 driver, rather than putting them in the access driver as I've done here. The challenge was that for CICE4 in ESM1.5, the access driver calculates the start date and run time used by the model, and hence it contains the data needed to correctly identify the iced.YYYYMMDD file and check its header.

Meanwhile the start date and prior runtime calculated in the cice driver (based on the cice_in.nml files) appear to be mostly (but not completely) unused by the model, are incorrect (i.e. in the PI configuration's cice.py calendar has always been 100 years off the actual date used by the model), and use the wrong type of calendar. Because of this, I thought it was simplest to implement the changes in the access driver where the available information is correct.

I've included a blank make_restart_ptr() method in the cice driver, and overridden it in the cice5 driver so that CICE5 is unaffected by these changes.

I'll get started on some testing, but it would be great to hear if you have any feedback on this overall structure. It's clearly not the cleanest or most intuitive, and so any ideas would be welcome.

@pep8speaks
Copy link

pep8speaks commented Dec 3, 2024

Hello @blimlim! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-12-12 05:06:02 UTC

Copy link
Contributor

@anton-seaice anton-seaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blimlim - if its feasible to check the value of dumpfreq in the cice namelist that would be great.

I think whatever we do will be unavoidably messy, i made some suggestions, not sure if they are useful or not!

payu/models/access.py Outdated Show resolved Hide resolved
payu/models/access.py Outdated Show resolved Hide resolved
payu/models/cice.py Outdated Show resolved Hide resolved
@blimlim blimlim marked this pull request as ready for review December 11, 2024 04:06
@coveralls
Copy link

coveralls commented Dec 11, 2024

Coverage Status

coverage: 58.97% (+0.5%) from 58.509%
when pulling be2e46b on ACCESS-NRI:538-check-cice4-restart-dates
into da04782 on payu-org:master.

Copy link
Contributor

@anton-seaice anton-seaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this and checked it makes the expected error, ill finish the review tomorrow :)

test/models/test_access.py Show resolved Hide resolved
payu/models/cice.py Outdated Show resolved Hide resolved
test/models/test_cice.py Outdated Show resolved Hide resolved
test/models/test_cice.py Outdated Show resolved Hide resolved
test/models/test_cice.py Show resolved Hide resolved
@anton-seaice
Copy link
Contributor

I was thinking about this yesterday - it leaves the config in a kind of weird state. e.g.

The model runs & completes (the archive step succeeds) however there is no restart saved to restart the model. The user only gets an error once they try to restart and is it clear they need to go back one model run to fix this ? I don't think payu has an easy way of going back one run ? ( I guess deleting the output & restart folder is the way to do that?)

Obviously once the check on the dump dates is done at the start of the run, then the issue goes away. I guess we could move the check in this PR to the end of a run, e.g. the archive step fails if the cice restart file is at the "wrong" time?. For a release, we can put this in the known issues and provide advice on the workaround as needed.

@blimlim
Copy link
Contributor Author

blimlim commented Dec 11, 2024

That's a good point!
Since we'll still be writing the restart pointer during setup, I'm wondering if it's worth keeping the check during setup to make sure that what we're writing is correct, and adding an additional one during archive. What would you think of adding a separate small check to the archive stage to see if the expected iced.YYYYMM file exists, so that we have:

Setup stage:

  • Check iced file matching run start date exists, and if so write it to the pointer file
  • Check the header to make sure the previous runtime is as expected

Archive stage:

  • Check for an iced file whose name matches the run end date

And once we add the dumpfreq checks we could remove/consolidate the different checks.

@anton-seaice
Copy link
Contributor

I am ok with all the checks being there :-). If you have time, then go ahead, if you don't and want to get a release out - then what we have prevents already in this PR prevents the initial error, its just messy to recover from .

@blimlim
Copy link
Contributor Author

blimlim commented Dec 12, 2024

Sweet, I've added an additional check during the archive stage in access.py here

I've incorporated a test of this into the existing test_access_cice_1year_runtimes. This test runs both setup and archive stages, and checks that no errors are raised when the expected iced file is present during archive. I think setting up an independent test might have required redoing much of the same setup, and so I thought it was easiest/made sense to include it there.
https://github.com/ACCESS-NRI/payu/blob/9135f89bdd5973e76d270dc595de1ffcadeb7841/test/models/test_access.py#L382-L394

There's also an additional test, checking that an error is raised when the correct iced file is not present.
https://github.com/ACCESS-NRI/payu/blob/9135f89bdd5973e76d270dc595de1ffcadeb7841/test/models/test_access.py#L404-L478

payu/models/access.py Outdated Show resolved Hide resolved
anton-seaice
anton-seaice previously approved these changes Dec 12, 2024
Copy link
Contributor

@anton-seaice anton-seaice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did adhoc tests for :

  • Trying to restart with a cice restart with the wrong date - payu raised error
  • Two consecutive months with correct restarts - ran ok

@anton-seaice
Copy link
Contributor

Thanks @blimlim this is neat piece of work :)

@blimlim
Copy link
Contributor Author

blimlim commented Dec 12, 2024

Awesome, thanks @anton-seaice for your help with this :) I'll go make that issue about the additional tests

I've just made one more tiny commit fixing a typo, and undoing an unnecessary change I'd made earlier.

@aidanheerdegen if you have any time to look over this, that would be much appreciated, apologies for cutting it so close time wise with this!

@blimlim blimlim merged commit 0c2a837 into payu-org:master Dec 13, 2024
8 checks passed
@blimlim blimlim deleted the 538-check-cice4-restart-dates branch December 13, 2024 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants