Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update chicoma-cpu modules #112

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cime_config/machines/cmake_macros/gnu_chicoma-cpu.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ endif()
set(PIO_FILESYSTEM_HINTS "lustre")
string(APPEND CMAKE_C_FLAGS_RELEASE " -O2 -g")
string(APPEND CMAKE_Fortran_FLAGS_RELEASE " -O2 -g")
string(APPEND CMAKE_EXE_LINKER_FLAGS " -Wl,--enable-new-dtags")
set(MPICC "cc")
set(MPICXX "CC")
set(MPIFC "ftn")
Expand Down
61 changes: 38 additions & 23 deletions cime_config/machines/config_machines.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3937,7 +3937,7 @@ commented out until "*** No rule to make target '.../libadios2pio-nm-lib.a'" iss
<DIN_LOC_ROOT_CLMFORC>/usr/projects/e3sm/inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>/lustre/scratch5/$ENV{USER}/E3SM/archive/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>/lustre/scratch5/$ENV{USER}/E3SM/input_data/ccsm_baselines/$COMPILER</BASELINE_ROOT>
<CCSM_CPRNC>/usr/projects/climate/SHARED_CLIMATE/software/badger/cprnc</CCSM_CPRNC>
<CCSM_CPRNC>/usr/projects/e3sm/software/chicoma-cpu/cprnc</CCSM_CPRNC>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I bet I know what happened here. I deleted this thinking that it was old and no longer used. In my defense, it has badger in the path...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it no longer exists! Not just the machine but that file. But I think if that line is missing it forces each test to try to build it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it no longer exists!

That's what I was saying. I think I deleted it trying to free up space in /usr/projects/climate because I couldn't imagine we were still using software built for Badger.

<GMAKE_J>10</GMAKE_J>
<TESTS>e3sm_developer</TESTS>
<NTEST_PARALLEL_JOBS>4</NTEST_PARALLEL_JOBS>
Expand All @@ -3957,11 +3957,11 @@ commented out until "*** No rule to make target '.../libadios2pio-nm-lib.a'" iss
</arguments>
</mpirun>
<module_system type="module" allow_error="true">
<init_path lang="perl">/usr/share/lmod/8.3.1/init/perl</init_path>
<init_path lang="perl">/usr/share/lmod/lmod/init/perl</init_path>
<!-- does not exist -->
<init_path lang="python">/usr/share/lmod/8.3.1/init/python</init_path>
<init_path lang="sh">/usr/share/lmod/8.3.1/init/sh</init_path>
<init_path lang="csh">/usr/share/lmod/8.3.1/init/csh</init_path>
<init_path lang="python">/usr/share/lmod/lmod/init/python</init_path>
<init_path lang="sh">/usr/share/lmod/lmod/init/sh</init_path>
<init_path lang="csh">/usr/share/lmod/lmod/init/csh</init_path>
<cmd_path lang="perl">/usr/share/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/usr/share/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="sh">module</cmd_path>
Expand All @@ -3973,39 +3973,40 @@ commented out until "*** No rule to make target '.../libadios2pio-nm-lib.a'" iss
<command name="unload">cray-parallel-netcdf</command>
<command name="unload">cray-netcdf</command>
<command name="unload">cray-hdf5</command>
Comment on lines 3973 to 3975
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing cpe like Noel did here:
https://github.com/E3SM-Project/E3SM/blob/bdcc2f551cfae2fca53bd8aa4ec604601ddf1c68/cime_config/machines/config_machines.xml#L193
But I got nasty error like:

        Lmod has detected the following error: These module(s) or extension(s) exist
        but cannot be loaded as requested: "git", "cmake/3.27.7"
           Try: "module spider git cmake/3.27.7" to see how to load the module(s).

<command name="unload">PrgEnv-gnu</command>
<command name="unload">PrgEnv-intel</command>
<command name="unload">PrgEnv-nvidia</command>
<command name="unload">PrgEnv-cray</command>
<command name="unload">PrgEnv-aocc</command>
<command name="unload">intel</command>
<command name="unload">intel-oneapi</command>
<command name="unload">nvidia</command>
<command name="unload">aocc</command>
<command name="unload">cudatoolkit</command>
<command name="unload">climate-utils</command>
<command name="unload">cray-libsci</command>
<command name="unload">craype-accel-nvidia80</command>
<command name="unload">craype-accel-host</command>
<command name="unload">perftools-base</command>
<command name="unload">perftools</command>
<command name="unload">darshan</command>
<command name="unload">PrgEnv-gnu</command>
<command name="unload">PrgEnv-intel</command>
<command name="unload">PrgEnv-nvidia</command>
<command name="unload">PrgEnv-cray</command>
<command name="unload">PrgEnv-aocc</command>
Comment on lines +3990 to +3994
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that these needed to be unloaded after their corresponding compiler modules or there would be an error about an undefined environment variable name.

</modules>

<modules compiler="gnu">
<command name="load">PrgEnv-gnu/8.4.0</command>
<command name="load">PrgEnv-gnu/8.5.0</command>
<command name="load">gcc/12.2.0</command>
<command name="load">cray-libsci/23.05.1.4</command>
</modules>

<modules compiler="nvidia">
<command name="load">PrgEnv-nvidia/8.4.0</command>
<command name="load">nvidia/22.7</command>
<command name="load">PrgEnv-nvidia/8.5.0</command>
<command name="load">nvidia/24.7</command>
Comment on lines +4004 to +4005
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I successfully tested these updated modules as well.

<command name="load">cray-libsci/23.05.1.4</command>
</modules>

<modules compiler="intel">
<command name="load">PrgEnv-intel/8.4.0</command>
<command name="load">intel-classic/2023.2.0</command>
<command name="load">PrgEnv-intel/8.5.0</command>
<command name="load">intel/2023.2.0</command>
</modules>

<modules compiler="amdclang">
Expand All @@ -4014,15 +4015,26 @@ commented out until "*** No rule to make target '.../libadios2pio-nm-lib.a'" iss
<command name="load">cray-libsci/23.05.1.4</command>
</modules>

<modules>
<modules compiler="intel">
<command name="load">craype-accel-host</command>
<command name="load">craype/2.7.30</command>
<command name="load">cray-mpich/8.1.28</command>
<command name="load">cray-hdf5-parallel/1.12.2.9</command>
<command name="load">cray-netcdf-hdf5parallel/4.9.0.9</command>
<command name="load">cray-parallel-netcdf/1.12.3.9</command>
</modules>

<modules compiler="!intel">
<command name="load">craype-accel-host</command>
<command name="load">craype/2.7.21</command>
<command name="load">cray-mpich/8.1.26</command>
<command name="load">libfabric/1.15.2.0</command>
<command name="load">cray-hdf5-parallel/1.12.2.3</command>
<command name="load">cray-netcdf-hdf5parallel/4.9.0.3</command>
<command name="load">cray-parallel-netcdf/1.12.3.3</command>
<command name="load">cmake/3.25.1</command>
</modules>

<modules>
<command name="load">cmake/3.27.7</command>
</modules>
</module_system>

Expand All @@ -4044,6 +4056,9 @@ commented out until "*** No rule to make target '.../libadios2pio-nm-lib.a'" iss
<env name="NETCDF_PATH">$ENV{CRAY_NETCDF_HDF5PARALLEL_PREFIX}</env>
<env name="PNETCDF_PATH">$ENV{CRAY_PARALLEL_NETCDF_PREFIX}</env>
</environment_variables>
<environment_variables compiler="gnu">
<env name="LD_LIBRARY_PATH">/opt/cray/pe/gcc/12.2.0/snos/lib64:$ENV{LD_LIBRARY_PATH}</env>
</environment_variables>
<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
Expand Down Expand Up @@ -4085,11 +4100,11 @@ AMD EPYC 7713 64-Core (Milan) (256GB) and 4 nvidia A100'</DESC>
</arguments>
</mpirun>
<module_system type="module" allow_error="true">
<init_path lang="perl">/usr/share/lmod/8.3.1/init/perl</init_path>
<init_path lang="perl">/usr/share/lmod/lmod/init/perl</init_path>
<!-- does not exist -->
<init_path lang="python">/usr/share/lmod/8.3.1/init/python</init_path>
<init_path lang="sh">/usr/share/lmod/8.3.1/init/sh</init_path>
<init_path lang="csh">/usr/share/lmod/8.3.1/init/csh</init_path>
<init_path lang="python">/usr/share/lmod/lmod/init/python</init_path>
<init_path lang="sh">/usr/share/lmod/lmod/init/sh</init_path>
<init_path lang="csh">/usr/share/lmod/lmod/init/csh</init_path>
<cmd_path lang="perl">/usr/share/lmod/lmod/libexec/lmod perl</cmd_path>
<cmd_path lang="python">/usr/share/lmod/lmod/libexec/lmod python</cmd_path>
<cmd_path lang="sh">module</cmd_path>
xylar marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -4156,7 +4171,7 @@ AMD EPYC 7713 64-Core (Milan) (256GB) and 4 nvidia A100'</DESC>
<command name="load">cray-hdf5-parallel/1.12.2.3</command>
<command name="load">cray-netcdf-hdf5parallel/4.9.0.3</command>
<command name="load">cray-parallel-netcdf/1.12.3.3</command>
<command name="load">cmake/3.25.1</command>
<command name="load">cmake/3.27.7</command>
</modules>
</module_system>

Expand Down
4 changes: 2 additions & 2 deletions components/mpas-framework/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -396,11 +396,11 @@ gnu-cray:
"FFLAGS_OPT = -O3 -m64 -ffree-line-length-none -fconvert=big-endian -ffree-form -ffpe-summary=none $${EXTRA_FFLAGS}" \
"CFLAGS_OPT = -O3 -m64" \
"CXXFLAGS_OPT = -O3 -m64" \
"LDFLAGS_OPT = -O3 -m64" \
"LDFLAGS_OPT = -O3 -m64 $(GNU_CRAY_LDFLAGS)" \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhoffman, this environment variable (or argument to make) needs to be set to:

export GNU_CRAY_LDFLAGS="-Wl,--enable-new-dtags"

on Chicoma for now. I'll make sure Compass and Polaris do this. If someone is building for Chicoma outside of Compass or Polaris (good luck!), they would need to set this manually.

Are you okay with this fix? I don't want to put in anything into the Makefile that tries to detect the machine or anything crazy like that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar , this seems like the best solution given the circumstances

"FFLAGS_DEBUG = -g -m64 -ffree-line-length-none -fconvert=big-endian -ffree-form -fbounds-check -fbacktrace -ffpe-trap=invalid,zero,overflow -ffpe-summary=none $${EXTRA_FFLAGS}" \
"CFLAGS_DEBUG = -g -m64" \
"CXXFLAGS_DEBUG = -g -m64" \
"LDFLAGS_DEBUG = -g -m64" \
"LDFLAGS_DEBUG = -g -m64 $(GNU_CRAY_LDFLAGS)" \
"FFLAGS_OMP = -fopenmp" \
"CFLAGS_OMP = -fopenmp" \
"BUILD_TARGET = $(@)" \
Expand Down