Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Antarctic Ice Sheet meshes for MALI #6440

Merged
merged 6 commits into from
Aug 14, 2024
Merged

Conversation

matthewhoffman
Copy link
Contributor

@matthewhoffman matthewhoffman commented May 22, 2024

This PR introduces two new meshes of the Antarctic Ice Sheet for MALI:

  • high res production mesh: mpas.ais4to20km
  • low-res testing mesh: mpas.ais8to30km

Associated with these MALI meshes are 5 new E3SM model_grids:

  • TL319_oQU240wLI_ais8to30 - ultra low res mesh for testing GG cases with JRA forcing
  • ne30pg2_r05_IcoswISC30E3r5_ais8to30 - v3 low res mesh for BG and IG cases with low res Antarctica
  • TL319_IcoswISC30E3r5_ais8to30 - v3 low res mesh for GG cases with JRA forcing with low res Antarctica
  • ne30pg2_r05_IcoswISC30E3r5_ais4to20 - v3 low res mesh for BG and IG cases with high res Antarctica
  • TL319_IcoswISC30E3r5_ais4to20 - v3 low res mesh for GG with JRA forcing cases with high res Antarctica

[BFB] for all currently tested configurations

@matthewhoffman
Copy link
Contributor Author

This PR previously discussed at E3SM-Ocean-Discussion#97

Copy link

github-actions bot commented May 22, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6440/
on branch gh-pages at 2024-08-13 17:21 UTC

@xylar
Copy link
Contributor

xylar commented May 28, 2024

@matthewhoffman and @jonbob, I rebased onto master, fixed up conflicts, and tried to run:

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf

and am seeing:

    Errors were:
        env_batch.xml appears to have changed, regenerating batch scripts
        manual edits to these file will be lost!
        
        wget failed with output:  and errput --2024-05-28 12:36:09--  https://web.lcrc.anl.gov/public/e3sm/inputdata/glc/mpasli/mpas.ais8to30km/ais8to30km.20231222.nc
        Resolving web.lcrc.anl.gov (web.lcrc.anl.gov)... 140.221.70.30
        Connecting to web.lcrc.anl.gov (web.lcrc.anl.gov)|140.221.70.30|:443... connected.
        HTTP request sent, awaiting response... 404 Not Found
        2024-05-28 12:36:10 ERROR 404: Not Found.
        
        ERROR: Could not find all inputdata on any server

I believe the issue is that the file is called ais_8to30km_20231222.nc instead of ais8to30km.20231222.nc. I'm not sure if we want to fix the filename or the branch.

@xylar
Copy link
Contributor

xylar commented May 28, 2024

The same problem exists with the 4to20km mesh:

$ ls /lcrc/group/e3sm/public_html/inputdata/glc/mpasli/mpas.ais4to20km/
ais_4to20km_20230105.nc
...

This should be ais4to20km.20230105.nc.

@xylar
Copy link
Contributor

xylar commented May 28, 2024

I'm not sure if we want to fix the filename or the branch.

I think we need to rename the files because I don't think there's support for having the prefix and datestamp be separated by an underscore.

@xylar
Copy link
Contributor

xylar commented May 28, 2024

@matthewhoffman, I think you need to make all the files in inputdata group read/write:

$ pwd
/lcrc/group/e3sm/public_html/inputdata/glc/mpasli/mpas.ais8to30km
$ ls -lah
total 177M
drwxr-sr-x  2 ac.mhoffman E3SM 4.0K May  7 10:54 .
drwxrwsr-x 10 jacob       E3SM 4.0K May  4 08:32 ..
-rw-r-----  1 ac.mhoffman E3SM 148M May  4 08:34 ais_8to30km_20231222.nc
-rw-r-----  1 ac.mhoffman E3SM 6.1M May  4 08:34 ais_8to30km_20231222.regionMask_ismip6.nc
-rw-r-----  1 ac.mhoffman E3SM  17M May  4 08:34 ais_8to30km_20231222.scrip.nc
-rw-r-----  1 ac.mhoffman E3SM 3.4M May  4 08:34 mpasli.graph.info.240507
-rw-r--r--  1 ac.mhoffman E3SM 377K May  7 10:51 mpasli.graph.info.240507.part.1024
-rw-r--r--  1 ac.mhoffman E3SM 302K May  4 08:38 mpasli.graph.info.240507.part.128
-rw-r--r--  1 ac.mhoffman E3SM 425K May  4 08:38 mpasli.graph.info.240507.part.1920
-rw-r--r--  1 ac.mhoffman E3SM 341K May  4 08:38 mpasli.graph.info.240507.part.240
-rw-r--r--  1 ac.mhoffman E3SM 343K May  4 08:38 mpasli.graph.info.240507.part.256
-rw-r--r--  1 ac.mhoffman E3SM 453K May  4 08:38 mpasli.graph.info.240507.part.3840
-rw-r--r--  1 ac.mhoffman E3SM 363K May  4 08:38 mpasli.graph.info.240507.part.480
-rw-r--r--  1 ac.mhoffman E3SM 364K May  4 08:38 mpasli.graph.info.240507.part.512
-rw-r--r--  1 ac.mhoffman E3SM 274K May  4 08:37 mpasli.graph.info.240507.part.64
-rw-r--r--  1 ac.mhoffman E3SM 374K May  4 08:38 mpasli.graph.info.240507.part.960

You also need to make everything world readable so it can be downloaded form the inputdata server.

Please run:

cd /lcrc/group/e3sm/public_html/inputdata/glc/mpasli
chmod -R ug+rwX mpas.ais8to30km mpas.ais4to20km
chmod -R o+rX mpas.ais8to30km mpas.ais4to20km

@jonbob
Copy link
Contributor

jonbob commented May 28, 2024

@xylar -- thanks for taking on the initial testing. There are a bunch of mapping files I need to make before any of the new resolutions can work, but also maybe there will be issues with file permissions

@xylar
Copy link
Contributor

xylar commented May 28, 2024

@jonbob, I think everything is readable so go ahead with mapping files. I don't think you need to add anything to the read-only directory for that purpose, so you should be good until @matthewhoffman is able to change permissions.

@xylar
Copy link
Contributor

xylar commented Jun 17, 2024

@matthewhoffman, the Slack bot is bugging me about this one. Have you had a chance to change permissions and fix the issues I pointed out above?

@matthewhoffman
Copy link
Contributor Author

@xylar and @jonbob , sorry about the permissions issue. I have updated the permissions on that directory, so try again and let me know if you have any further problems.

@xylar
Copy link
Contributor

xylar commented Jun 21, 2024

@matthewhoffman, it doesn't look like the filenames have been fixed, see #6440 (comment) and #6440 (comment) above. Could you take care of that, too?

@xylar
Copy link
Contributor

xylar commented Jun 21, 2024

@matthewhoffman, also, could you rebase onto master to resolve conflicts?

@xylar
Copy link
Contributor

xylar commented Jun 21, 2024

@matthewhoffman, I think something you did to update this branch (probably a rebase) also took out an earlier commit that added the ocn_glcshelf test.

Can you make sure you can run the following on Chrysalis and ping me to re-review after that works for you?

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf

@matthewhoffman matthewhoffman force-pushed the matthewhoffman/mali/ais-meshes branch from 88e0d16 to ef6171c Compare June 26, 2024 18:03
@matthewhoffman
Copy link
Contributor Author

@xylar , sorry about all the little issues in this PR and for letting it sit for so long. I've looked through it all and made the following changes:

  • the ais mesh naming convention has been fixed and follow that of the gis meshes. This required both a change to the filenames on the input server (to replace the _ between mesh name and date with .) and change in this PR to include an underscore between 'ais' and the resolution
  • the ocn_glcshelf test commit was actually in this other PR that has already been merged: https://github.com/E3SM-Project/E3SM/pull/6437/commits So I have rebased this PR so that is available. That also resolves conflicts with master.

With these changes, I ran ./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf and got this error:

    Errors were:
        Building test for SMS in directory /lcrc/group/e3sm/ac.mhoffman/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240626_125417_w8meoq
        WARNING: Should be running with salinity restoring on!
                 But no file available for this grid.
        ERROR: /gpfs/fs1/home/ac.mhoffman/e3sm-gis/E3SM-ais-meshes/share/build/buildlib.csm_share FAILED, cat /lcrc/group/e3sm/ac.mhoffman/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240626_125417_w8meoq/bld/csm_share.bldlog.240626-125429

Do salinity restoring files need to be created for the oQU240wL?

In the meantime, @jonbob , you could proceed to generate the mapping files that will be needed for this PR.

@xylar
Copy link
Contributor

xylar commented Jun 26, 2024

@matthewhoffman, I think there must be other errors besides salinity restoring. You will get complaints but the test should run.

@matthewhoffman
Copy link
Contributor Author

@xylar , you are right, sorry about that. The error seems to be related to directives in building PIO-related code, which is not something we are touching in this PR.

/gpfs/fs1/home/ac.mhoffman/e3sm-gis/E3SM-ais-meshes/share/util/mct_mod.F90(825): remark #5140: Unrecognized directive
!DIR$ PREFERVECTOR
------------------^
/gpfs/fs1/home/ac.mhoffman/e3sm-gis/E3SM-ais-meshes/share/util/shr_pio_mod.F90(734): error #6404: This name does not have a type, and must have an explicit type.   [PIO_REARR_A
NY]
        pio_rearranger .ne. PIO_REARR_ANY) then
----------------------------^
compilation aborted for /gpfs/fs1/home/ac.mhoffman/e3sm-gis/E3SM-ais-meshes/share/util/shr_pio_mod.F90 (code 1)
gmake[2]: *** [CMakeFiles/csm_share.dir/build.make:376: CMakeFiles/csm_share.dir/util/shr_pio_mod.F90.o] Error 1

I'll experiment with a few other test definitions to see if this is a pervasive error.

@xylar
Copy link
Contributor

xylar commented Jun 27, 2024

@matthewhoffman, any chance this is a submodule issue? Maybe try a fresh clone or worktree?

@xylar
Copy link
Contributor

xylar commented Jun 27, 2024

I was able to build successfully but I'm getting a segfault at runtime:

  5: ==== backtrace (tid:2007194) ====
  5:  0 0x0000000000012cf0 __funlockfile()  :0
  5:  1 0x00000000020512ff mpas_rbf_interpolation_mp_mpas_rbf_interp_func_3d_plane_vec_const_dir_comp_coeffs_.A()  /lcrc/group/e3sm/ac.xylar/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_022044_tr9lrp/bld/cmake-bld/operators/mpas_rbf_interpolation.f90:1700
  5:  2 0x000000000206edd0 mpas_vector_reconstruction_mp_mpas_init_reconstruct_.A()  /lcrc/group/e3sm/ac.xylar/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_022044_tr9lrp/bld/cmake-bld/operators/mpas_vector_reconstruction.f90:176
  5:  3 0x0000000001c829a4 li_core_mp_li_core_init_.A()  /lcrc/group/e3sm/ac.xylar/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_022044_tr9lrp/bld/cmake-bld/core_landice/mode_forward/mpas_li_core.f90:1030
  5:  4 0x0000000001c58b4a glc_comp_mct_mp_glc_init_mct_()  /lcrc/group/e3sm/ac.xylar/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_022044_tr9lrp/mpas-albany-landice/driver/glc_comp_mct.f90:531
  5:  5 0x0000000000451e54 component_mod_mp_component_init_cc_()  /gpfs/fs1/home/ac.xylar/e3sm_work/E3SM/matthewhoffman/mali/ais-meshes/driver-mct/main/component_mod.F90:257
  5:  6 0x000000000043e74c cime_comp_mod_mp_cime_init_()  /gpfs/fs1/home/ac.xylar/e3sm_work/E3SM/matthewhoffman/mali/ais-meshes/driver-mct/main/cime_comp_mod.F90:1518
  5:  7 0x000000000044eb4a MAIN__()  /gpfs/fs1/home/ac.xylar/e3sm_work/E3SM/matthewhoffman/mali/ais-meshes/driver-mct/main/cime_driver.F90:122
  5:  8 0x000000000041b1a2 main()  ???:0
  5:  9 0x000000000003ad85 __libc_start_main()  ???:0
  5: 10 0x000000000041b0ae _start()  ???:0

See

/lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_022044_tr9lrp/

I'll try rerunning in debug mode to see if that provides any helpful info.

@xylar
Copy link
Contributor

xylar commented Jun 27, 2024

In debug mode:

/lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/SMS_D_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf.20240627_033510_umnvw1

I'm seeing:

 56: forrtl: severe (408): fort: (3): Subscript #2 of the array VERTICESONEDGE has value 0 which is less than the lower bound of 1
 56:
 56: Image              PC                Routine            Line        Source
 56: libpnetcdf.so.3.0  000015554B9391A2  for_emit_diagnost     Unknown  Unknown
 56: e3sm.exe           00000000065FF55F  li_mesh_mp_meshsi         584  mpas_li_mesh.f90
 56: e3sm.exe           00000000065FC22C  li_mesh_mp_li_mes         416  mpas_li_mesh.f90
 56: e3sm.exe           0000000006097154  li_core_mp_li_cor         226  mpas_li_core.f90
 56: e3sm.exe           000000000601F4BF  glc_comp_mct_mp_g         531  glc_comp_mct.f90
 56: e3sm.exe           000000000048BDDF  component_mod_mp_         257  component_mod.F90
 56: e3sm.exe           000000000042EC9B  cime_comp_mod_mp_        1518  cime_comp_mod.F90
 56: e3sm.exe           000000000048286C  MAIN__                    122  cime_driver.F90
 56: e3sm.exe           000000000041ABA2  Unknown               Unknown  Unknown
 56: libc-2.28.so       0000155545281D85  __libc_start_main     Unknown  Unknown
 56: e3sm.exe           000000000041AAAE  Unknown               Unknown  Unknown

Presumably, this error is getting caught before the other error so it will need to be handled before we get to the one in RBF reconstruction. It seems like verticesOnEdge is being used where it is invalid and a check is needed?

@matthewhoffman
Copy link
Contributor Author

@xylar , yes, that makes sense - I had forgotten to update the submodules after rebasing. I can look at these MALI errors. I'm confused why they are showing up in this test and haven't shown up in other tests given this is not introducing any new functionality in MALI.

@xylar
Copy link
Contributor

xylar commented Jul 1, 2024

@matthewhoffman, I can't speak to the vector reconstruction error but regarding the out-of-bounds indexing, do you maybe not have any tests that are compiled in debug mode? If not, maybe that one has just gone uncaught?

@jonbob
Copy link
Contributor

jonbob commented Jul 3, 2024

I added the mapping files to this branch and have staged them in their corresponding locations on the lcrc local inputdata location for testing

@jonbob
Copy link
Contributor

jonbob commented Jul 3, 2024

@xylar -- your test may have failed because there were no mapping files specified in config_grids. I'm running something similar right now

@xylar
Copy link
Contributor

xylar commented Jul 3, 2024

Aaaah! It would be nice if E3SM gave an error that hinted a bit more in that direction but that certainly does sound like a good reason I was having problems!

@jonbob
Copy link
Contributor

jonbob commented Jul 5, 2024

@xylar -- ah, I think E3SM would throw an error if a mapping file had been defined, but those mapping file entries were defined but blank so no file to even look for... But I was incorrect, that's not why your test failed. I ran something similar and saw the same error in the e3sm log. But MALI was throwing errors about not having xtime=0.0 in the file, and somehow that's the error e3sm ended up with? Anyway, @matthewhoffman -- I think in general we remove xtime from these files. I tested with that and got a new error about nEdgesOnCell being greater than maxEdges. I checked values in the initial file and saw this:

ncks -H -d nCells,1 -v nEdgesOnCell ais_8to30km.20231222-no-xtime.nc | more
netcdf ais_8to30km.20231222-no-xtime {
  dimensions:
    nCells = 1 ;

  variables:
    int nEdgesOnCell(nCells) ;

  data:
    nEdgesOnCell = 1072693248 ;

} // group /

so something is very wrong with ais_8to30km.20231222.nc

@matthewhoffman
Copy link
Contributor Author

@jonbob and @xylar , thanks for brining these issues to my attention. As discussed, @trhille and I are working on an updated 4km initial condition. I'll update the PR with that file and investigate the 8km file issue after returning from the SciDAC meeting next week (or possibly while I'm there).

@matthewhoffman
Copy link
Contributor Author

I’ve re-evaluated this PR carefully, and I ended up finding issues with both the 8km and 4km mesh files.

For the 8km mesh, I realized I had introduced a mesh that had not completed QAQC testing, so I rolled back to the previous version of the mesh that we had used extensively for ISMIP6-2300. The ice thickness initial condition makes it overly prone to Thwaites Glacier retreat, but given the purpose of this mesh is testing, that’s not really a concern (and maybe it’s actually useful).

For the 4km mesh, two issues have been fixed:


  • Trevor, Xylar, and I identified an issue with artifacts in ice-shelf thickness in the Amundsen Sea sector due to an error in how we had stitched some datasets together. This does not affect the ability of MALI to run, but it does introduce unnecessary complications into generating consistent MPAS-Ocean ice-shelf cavities. Given that is an ongoing goal of introducing these meshes, we decided to resolve that in this PR, and the 4km mesh file has now been updated with a version that eliminates the problem.
  • I also discovered that the decomposition files for the 4km mesh were somehow incorrect, so I’ve updated those on the inputdata server.

With these updates, I ran tests for all the new meshes:

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf
PASSED

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_IcoswISC30E3r5_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf
PASSED

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_IcoswISC30E3r5_ais4to20.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf
FAILED - MALI CFL ERROR

The failure on the last item in the list is because it is attempting to use the SIA solver at relatively high resolution, including ice shelves, which is not a good idea and is likely to give unrealistic velocities that would trigger a CFL error. It got far enough (into the first GLC coupling interval) that I think everything about the mesh is working fine and I just need to come up with a better test. I also need to come up with tests for the two mesh_grids being added for B-cases. I will coordinate with @jonbob on both of these two issues.

@matthewhoffman
Copy link
Contributor Author

With the addition of a FOLISIO compset, I was able now successfully test the third grid that had failed in my previous testing:

./create_test --wait --walltime 1:00:00 SMS_Ld5.TL319_IcoswISC30E3r5_ais4to20.MPAS_FOLISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf
PASSED

The remaining task is to demonstrate successful tests of the two remaining grids added in this PR, which will require B-cases.

@matthewhoffman
Copy link
Contributor Author

I've successfully tested the two fully-coupled grid specifications:

./create_test --wait --walltime 1:00:00 SMS_Ld5.ne30pg2_r05_IcoswISC30E3r5_ais8to30.BGWCYCL1850.chrysalis_gnu
PASS

./create_test --wait --walltime 1:00:00 SMS_Ld5.ne30pg2_r05_IcoswISC30E3r5_ais4to20.BGWCYCL1850.chrysalis_gnu
PASS

with a couple caveats:

But this confirms that these new AIS meshes work in a B-case and the mapping files are correct.

(thanks, @jonbob , for helping me sort through all these details!)

@xylar , do you think we should deal with enabling ocn_glcshelf coupling for B-cases in this PR? My preference is to leave it out, as there will likely be other adjustments we need to make to support B-cases with iceshelf coupling, but I'm open to addressing it here. If the answer is no, then I have completed all necessary updates and testing and this PR is ready for re-review from @jonbob and @xylar .

@xylar
Copy link
Contributor

xylar commented Jul 25, 2024

@xylar , do you think we should deal with enabling ocn_glcshelf coupling for B-cases in this PR?

@matthewhoffman, no, I agree that we need to leave that for another PR at a later time.

Update: I think the existing BG test cases would already test melt fluxes in coupled mode if you were to run with PISMF, see below. That's the case for CRYO configurations other than with IcoswISC30E3r5 but not for WCYCL.

@xylar
Copy link
Contributor

xylar commented Jul 25, 2024

I could not enable the ocn-glcshelf coupling in these tests because the testMod that supports that also includes modifications specific to a JRA G-case

@matthewhoffman I think all cases with PISMF and MALI have melt-fluxes in coupled mode, see:

<value compset="_MPASO%.*PISMF.*_MALI">coupled</value>

So I don't think ocn_glcshelf is needed or useful in any cases but JRA.

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test merge of this branch with master. Then, I ran a test with a baseline from master to make sure things are BFB:

SMS_D_P480_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel

and, as expected, they are.

Then, I successfully ran:

SMS_D_Ld5.TL319_oQU240wLI_ais8to30.MPAS_FOLISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf
SMS_D_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf
SMS_D_Ld5.TL319_IcoswISC30E3r5_ais4to20.MPAS_FOLISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf

(I tried the intel versions of the FOLISIO tests but Trilinos isn't available for that compiler on Chrysalis.)

I didn't test any BG cases.

I looked through the code changes and they look good to me. I have a couple of questions about the existing LISIO test but I'm approving regardless of the answers there.

cime_config/allactive/config_compsets.xml Show resolved Hide resolved
@matthewhoffman
Copy link
Contributor Author

@matthewhoffman I think all cases with PISMF and MALI have melt-fluxes in coupled mode

Oh, I see, thanks for pointing that out. I had missed that. As far as I can tell, we do not yet have any compsets with PISMF and MALI, but that is the goal you and I are working toward, and there likely are other issues needing attention before we introduce that.

@jonbob
Copy link
Contributor

jonbob commented Jul 30, 2024

@matthewhoffman -- I ran the new test that this PR brings in,

ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.mpaso-ocn_glcshelf

and it failed on chrysalis with:

PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf CREATE_NEWCASE
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf XML
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf SETUP
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf SHAREDLIB_BUILD time=182
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf MODEL_BUILD time=1045
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf SUBMIT
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf RUN time=87
FAIL ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf COMPARE_base_rest

Can you think of any reason this resolution wouldn't restart exactly? I tested the one it replaces and that does pass, though I didn't see anything in mali_in to explain the difference

matthewhoffman and others added 6 commits August 11, 2024 23:22
This commit adds the mpas.ais4to20km and mpas.ais8to30km meshes for
MALI.  Mapping file entries are added but the mapping files themselves
don't exist yet.
This will provide a low res testing configuration.  Mapping files are
listed but the filenames not yet added.
The updated mpas.ais4to20km mesh correctly handles merging bedmap2 for
Amundsen Sea Sector into the BedMachine geometry elsewhere.  The
previous version had an error in how that was done.

The updated mpas.ais8to30km mesh rolls back from an inadequately tested
8km mesh to the version that was used in ISMIP6-AIS-2300.
This commit adds a FOLISIO compset (First-Order Land Ice, Sea Ice,
Ocean) that uses the Albany First-Order velocity solver in MALI instead
of the standard shallow-ice approximation solver.  The SIA solver is
inappropriate for Antarctica and was causing CFL violations when used
with the new mesh, making it not practical to use even for smoketesting.
@matthewhoffman matthewhoffman force-pushed the matthewhoffman/mali/ais-meshes branch from 2696b1d to ed1992a Compare August 12, 2024 06:56
@matthewhoffman
Copy link
Contributor Author

@jonbob , thanks for catching that. I've finally had a chance to look into it myself, and I found that the same test with the existing mesh is also failing when I run it manually:

./create_test --wait --walltime 1:00:00 ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf

Contents of TestStatus:

PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf CREATE_NEWCASE
PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf XML
PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SETUP
PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SHAREDLIB_BUILD time=88
PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf MODEL_BUILD time=168
PASS ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SUBMIT
FAIL ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf RUN time=52

When I look at e3sm.log for that run I see this:

  0:  ERROR: (seq_infodata_Check) : invalid continue restart case name =

Looking at the code, that message should be printing out infodata%rest_case_name, which apparently has no value. I dug a bit further into the code, and it looks like that field gets populated by the value of the variable seq_infodata_case_name in the restart file. When I look at the contents of the restart file, e.g.:

ncdump -v seq_infodata_case_name  ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_223654_ujmx4r.cpl.r.0001-01-04-00000.nc

I see that seq_infodata_case_name is empty.

However it looks like the case name is set correctly in env_case.xml:

388     <entry id="CASE" value="ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_223654_ujmx4r">
389       <type>char</type>
390       <desc>case name</desc>
391     </entry>

However, if I rebase onto master the error with ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf goes away, so I've pushed that.

Unfortunately, an error using the new mesh persists:

PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf CREATE_NEWCASE
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf XML
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SETUP
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SHAREDLIB_BUILD time=80
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf MODEL_BUILD time=174
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SUBMIT
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf RUN time=77
FAIL ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf COMPARE_base_rest
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf MEMLEAK insufficient data for memleak test
PASS ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf SHORT_TERM_ARCHIVER

with this in TestStatus.log:

    ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx.cpl.hi.0001-01-06-00000.nc.base did NOT match ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx.cpl.hi.0001-01-06-00000.nc.rest
    cat /lcrc/group/e3sm/ac.mhoffman/scratch/chrys/ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx/run/ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx.cpl.hi.0001-01-06-00000.nc.base.cprnc.out
FAIL
 ---------------------------------------------------
2024-08-11 23:39:05: compared suffixes suffix1 'base' suffix2 'rest'

tail -n20 /lcrc/group/e3sm/ac.mhoffman/scratch/chrys/ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx/run/ERS_Ld5.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chrysalis_gnu.mpaso-ocn_glcshelf.20240811_233154_v8benx.cpl.hi.0001-01-06-00000.nc.base.cprnc.out

               98341  ( 73106,     1,     1) (     1,     1,     1)
          avg abs field values:    7.243164371509058E-01    rms diff: 3.1E-01   avg rel diff(npos):  4.8E-05
                                   7.216826573162387E-01                        avg decimal digits(ndif):  0.2 worst:  0.2
 RMS x2g_Fogx_qicehi                  3.1332E-01            NORMALIZED  4.3335E-01

************************************************************************************************************************************

SUMMARY of cprnc:
 A total number of    416 fields were compared
          of which     12 had non-zero differences
               and      0 had differences in fill patterns
               and      0 had different dimension sizes
               and      0 had different data types
 A total number of      0 fields could not be analyzed
 A total number of      0 time-varying fields on file 1 were not found on file 2.
 A total number of      0 time-constant fields on file 1 were not found on file 2.
 A total number of      0 time-varying fields on file 2 were not found on file 1.
 A total number of      0 time-constant fields on file 2 were not found on file 1.
  diff_test: the two files seem to be DIFFERENT

It's not obvious to me where the problem is coming from, but I'll look into it further. My suspicion is it's an issue with the ocn-glcshelf coupling that the old mesh didn't expose, but I'm not sure yet. If so, the best strategy might be to keep the test using the old mesh for now. I'll follow up.

@matthewhoffman
Copy link
Contributor Author

I think I understand the test error. The previous ais20 mesh had a constant temperature field of value 0 across the entire mesh, even where there is no ice. The new ais8to30 mesh has a realistic temperature field where there is ice and values of 0 where there is no ice. Currently we have temperature evolution turned off in MALI (but we need to get that enabled soon), which means if the ice extent advances, it will advance into a cell with T=0, which could be problematic. I think that constant temperature field in the ais20 mesh was masking this issue.

I ran a test where I overwrote the temperature field in the ais8to30 mesh with a constant T=0 value, as the old mesh has. When I do that, the test passes. Given that a more careful evaluation of the issue is required, and the likely fix will include enabling the temperature solver and likely other changes, I propose to remove the test change from this PR and add it to a new PR that fixes these temperature issues.

@matthewhoffman matthewhoffman force-pushed the matthewhoffman/mali/ais-meshes branch from ed1992a to 0a92e25 Compare August 13, 2024 17:19
@jonbob jonbob added the BFB PR leaves answers BFB label Aug 13, 2024
Copy link
Contributor

@jonbob jonbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved based on visual inspection and testing

@stephenprice
Copy link
Contributor

If this is simply a request / need to update the default namelist so that the thermal solver is active by default I think we should just do that. Most realistic configs will need this anyway. I am happy to help out w/ re-running any tests once there's a PR in in place for this change.

jonbob added a commit that referenced this pull request Aug 13, 2024
Add Antarctic Ice Sheet meshes for MALI

This PR introduces two new meshes of the Antarctic Ice Sheet for MALI:
* high res production mesh: mpas.ais4to20km
* low-res testing mesh: mpas.ais8to30km
Associated with these MALI meshes are 5 new E3SM model_grids:
* ultra low res mesh for testing GG cases with JRA forcing:
  TL319_oQU240wLI_ais8to30
* v3 low res mesh for BG and IG cases with low res Antarctica:
  ne30pg2_r05_IcoswISC30E3r5_ais8to30
* v3 low res mesh for GG cases with JRA forcing with low res Antarctica:
  TL319_IcoswISC30E3r5_ais8to30
* v3 low res mesh for BG and IG cases with high res Antarctica:
  ne30pg2_r05_IcoswISC30E3r5_ais4to20
* v3 low res mesh for GG with JRA forcing cases with high res Antarctica
  TL319_IcoswISC30E3r5_ais4to20

[BFB] for all currently tested configurations
@jonbob
Copy link
Contributor

jonbob commented Aug 13, 2024

Passes:

  • e3sm_landice_developer using gnu on chrysalis
  • ERS_Ld5.TL319_oQU240wLI_ais20.MPAS_LISIO_JRA1p5.chrysalis_intel.mpaso-ocn_glcshelf
  • ERP_Ld3.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-pioroot1

merged to next

@jonbob jonbob merged commit ebc39ea into master Aug 14, 2024
21 checks passed
@jonbob jonbob deleted the matthewhoffman/mali/ais-meshes branch August 14, 2024 15:46
@jonbob
Copy link
Contributor

jonbob commented Aug 14, 2024

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB mpas-albany-landice
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants