Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ne3np4 to namelist_defaults_ctsm.xml and Makefile for PTS mode #2835

Open
wants to merge 14 commits into
base: cesm3_0_beta04_changes
Choose a base branch
from

Conversation

slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Oct 16, 2024

Description of changes

Same as title.

Specific notes

Contributors other than yourself, if any:
@jtruesdal

CTSM Issues Fixed (include github issue #):
Fixes #2768

Are answers expected to change (and if so in what way)?
No.

Any User Interface Changes (namelist or namelist defaults changes)?
New fsurdat/landuse files added to the defaults.

Does this create a need to change or add documentation? Did you do so?
No.

Testing performed, if any:
On derecho
PASS Test the addition of the new fsurdat/landuse files to the Makefile with

make crop-global-SSP2-4.5-ne3
make crop-global-1850-low-res
make crop-global-present-low-res

PASS ./build-namelist_test.pl (before and after adding ne3np4 tests)
PASS python testing
PASS clm_pymods

@slevis-lmwg slevis-lmwg self-assigned this Oct 16, 2024
@slevis-lmwg slevis-lmwg changed the base branch from master to cesm3_0_beta04_changes October 23, 2024 21:12
@slevis-lmwg slevis-lmwg marked this pull request as ready for review October 23, 2024 21:12
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Oct 24, 2024

List of the tests that I changed from f45 to ne3:

SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsROA
SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA
SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA
SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA

BUT
./create_test SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA
fails with

ERROR: surface dataset vs. land domain lon/lat mismatch error
   89.9999999999996       3.552713678800501E-014

Does this mean (UPDATE: the correct answer is 2, documented here)

  1. I picked the wrong grid alias for ne3np4? The only other one is ne3pg3_ne3pg3_mg37, and we do not want that.
  2. I did not generate the fsurdat file correctly? See how I generated the fsurdat in my post in the issue.
  3. A problem with the domain file DIN_LOC_ROOT/share/domains/domain.lnd.ne3np4_gx3v7.230718.nc?
  4. A shortcoming with mksurfdata_esmf, as I think this comment suggests?

I am generating new fsurdat/landuse files now.

  • Reminder to update the nag test in expected test failures.

…efile

slevis resolved conflicts:
bld/unit_testers/build-namelist_test.pl
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Oct 25, 2024

  • Rename files to the original names to reduce proliferation of files and to avoid anyone using the bad files.
  • Try the same test to confirm it works after this commit: 87afe5c
  • Copy to /inputdata and ./rimport

@slevis-lmwg

This comment was marked as resolved.

@slevis-lmwg
Copy link
Contributor Author

@jtruesdal the test worked this time. Thank you for your help with this.

@slevis-lmwg
Copy link
Contributor Author

From meeting with @ekluzek
I should finalize testing and merge this PR.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 8, 2024

derecho tests:
PASS ./build-namelist_test.pl
PASS python testing
OK ./run_sys_tests -s aux_clm -c ctsm5.3.009 --skip-generate
in /glade/derecho/scratch/slevis/tests_1108-123104de

izumi tests:
OK ./run_sys_tests -s aux_clm -c ctsm5.3.009 --skip-generate
originally failed (but see update on 2024/11/13)
in /fs/cgd/data0/slevis/git_people/jtruesdal/ne3np4_to_defaults_and_makefile/tests_1108-124238iz
but all failures involve Mmpi-serial tests, which relates to this post.
One of the Mmpi-serial failures looks different but the same test on derecho-intel passes:
SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA
Given izumi's test-flakiness since the recent upgrade, I will not spend time on this, though it may turn out to be a legitimate problem. If we decide to look into it:
/fs/cgd/data0/slevis/git_people/jtruesdal/ne3np4_to_defaults_and_makefile/tests_1108-124238iz/SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA.C.1108-124238iz_gnu/run

@slevis-lmwg
Copy link
Contributor Author

@jtruesdal
I ran complete ctsm testing (details in my previous post, directly above) and one ptsRLA test failed on izumi-gnu, while the same test passed on derecho-intel.

The FAIL on izumi: /fs/cgd/data0/slevis/git_people/jtruesdal/ne3np4_to_defaults_and_makefile/tests_1108-124238iz/SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA.C.1108-124238iz_gnu

The PASS on derecho:/glade/derecho/scratch/slevis/tests_1108-123104de/SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA.C.1108-123104de_int

Izumi was upgraded recently. This has caused flakiness in our Mmpi-serial tests. Still, this particular error looks different, so I wonder whether it's real, despite not failing on derecho.

If you feel we can disregard this failure on izumi, then I would ask you to review/approve my PR, so that we may merge it. Again, thank you for your help, John.

@jtruesdal
Copy link
Contributor

If you've run it a few times and it fails in the same way I think I'll have to look into more. It appears to be dying in the land and could be related to my mods.

@slevis-lmwg
Copy link
Contributor Author

@jtruesdal the test's /run directory shows identical failure results from three attempts. (Clm's izumi test-suite runs multiple attempts automatically to get past machine flakiness.)

@jtruesdal
Copy link
Contributor

@ekluzek I'm looking at the test in question but am unsure how the land pts mode test is supposed to work? I always thought the test was setup with boundary data containing just a point but this isn't the case as its using full resolution fsurfdat etc. Did this test run with previous versions of the SE grid? If you can point me to a working test I can see what is going wrong with this new SE grid.

@slevis-lmwg
Copy link
Contributor Author

First, in case this was a false failure due to the recent izumi upgrade, I decided to manually submit
./create_test SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu
and SURPRISE the test now PASSED.

I apologize for the false alarm @jtruesdal.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 13, 2024

I take it back, I submitted the wrong test. Let me try again with
./create_test SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 13, 2024

The test still fails. Latest one is here on izumi:
/scratch/cluster/slevis/SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA.20241113_163638_5fyem2

For a test that works, how about the derecho equivalent of the failing one:
/glade/derecho/scratch/slevis/tests_1108-123104de/SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA.C.1108-123104de_int

I don't have other such tests using a SE grid. Prior to using the ne3 grid, we ran these tests with f45. For example:
/glade/derecho/scratch/slevis/tests_1113-122038de/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA.GC.1113-122038de_int

@ekluzek
Copy link
Collaborator

ekluzek commented Nov 14, 2024

Actually, @jtruesdal and @slevis-lmwg I think this is likely something we may need to talk about in a meeting. PTS_MODE for CTSM in the past was to run CLM how it would run within SCAM -- but in an I case. It's primary purpose being to make sure we don't screw up CLM with our changes, but it's also something that could be useful for CLM I cases on their own.

So it was setup to read in full 2D grids and pick a point out of that 2D grid. But, I think it might have assumed that the 2D grid is a regular grid and not in vector format (like the SE grids). If so I'm thinking that PTS_MODE might not work for SE grids. But, is SCAM updated on the CAM side to use unstructured grids like for the SE dycore? If so we might need to do the same thing in CTSM.

In any case we might need to talk more about this to make sure we know what we need to support SCAM on the CLM side of things. So we'll need to know how SCAM currently works within CAM and how you need it to work.

@jtruesdal
Copy link
Contributor

@slevis-lmwg @ekluzek I was thinking along the lines of Erik as the code I looked at doesn't appear to have the logic to handle an unstructured grid in PTS mode. The derecho test has warning messages in the log about not being able to find a proper lat/lon column. I think it is running but giving wrong results. This I test also looks like it is trying to interpolate an f09 initial condition for a single column which would require my warm start mods PR. I can retest this case after a test merge of those mods to see if they fix the issue. My guess is that there might be a few more minor mods to get it running but shouldn't be hard. I can do that work but it will take a few days for me to get to it.

@slevis-lmwg slevis-lmwg changed the title ne3np4 to namelist_defaults_ctsm.xml and Makefile ne3np4 to namelist_defaults_ctsm.xml and Makefile for PTS mode Nov 25, 2024
@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Nov 26, 2024

@jtruesdal
Heads up that @ekluzek suggested the three of us meet some time next week to discuss this. I will find a time that works for us.

UPDATE
Notes from 2024/12/4 meeting:
https://docs.google.com/document/d/1v7RUJ3u8ALtIJl9EFeLdMvOnGAxYJ1RxvkDCLhA7BPI/edit?usp=sharing

@slevis-lmwg
Copy link
Contributor Author

@jtruesdal I confirmed that the last commit in this PR
a389e6d
is the same as what "git log" shows in my local branch here:
/glade/work/slevis/git_people/jtruesdal/ne3np4_to_defaults_and_makefile

I will wait until after you finish your part of this to update to the latest cesm3_0_beta04_changes branch.

@jtruesdal
Copy link
Contributor

Thanks @slevis-lmwg I created a PR against your branch

@slevis-lmwg
Copy link
Contributor Author

Sounds good @jtruesdal
When it's ready for me to merge back to this PR, pls let me know the link to your PR.
You should also have appropriate permissions to merge it back here yourself, if you prefer to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Stalled
Development

Successfully merging this pull request may close these issues.

Add ne3np4 to list of surface dataset resolutions
3 participants