Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On pm-cpu minor pelayout adjustment for hcru_hcru.I20TRGSWCNPRDCTCBC #6484

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Jun 21, 2024

For pm-cpu, try using 192 tasks instead of 128 tasks for grid l%360x720cru.
The ERS.hcru_hcru.I20TRGSWCNPRDCTCBC.pm-cpu_intel.elm-erosion test works, but was running over 30 minutes.
The test currently uses 1 node (128 tasks) and trying with 192 tasks speeds up the overall test from about 30 min to 19 min.

With SMS test, it will also work with 256 tasks (2 full nodes), but see failure with ERS test.

BFB except for NML changes.

@ndkeen ndkeen self-assigned this Jun 21, 2024
@ndkeen ndkeen added Machine Files BFB PR leaves answers BFB pm-cpu Perlmutter at NERSC (CPU-only nodes) labels Jun 21, 2024
Copy link

PR Preview Action v1.4.7
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6484/
on branch gh-pages at 2024-06-21 21:19 UTC

@ndkeen
Copy link
Contributor Author

ndkeen commented Jun 25, 2024

The tests that were timing out have been passing last couple of days.
Should I merge this PR? Or make issue about why we can't use a full 2 nodes (128 tasks)?

@rljacob
Copy link
Member

rljacob commented Jun 25, 2024

Do both I guess.

@ndkeen
Copy link
Contributor Author

ndkeen commented Jun 25, 2024

I was afraid you would say that

ndkeen added a commit that referenced this pull request Jun 25, 2024
… next (PR #6484)

For pm-cpu, try using 192 tasks instead of 128 tasks for grid l%360x720cru.
The ERS.hcru_hcru.I20TRGSWCNPRDCTCBC.pm-cpu_intel.elm-erosion test works, but was running over 30 minutes.
The test currently uses 1 node (128 tasks) and trying with 192 tasks speeds up the overall test from about 30 min to 19 min.

With SMS test, it will also work with 256 tasks (2 full nodes), but see failure with ERS test.

BFB except for NML changes.
@ndkeen
Copy link
Contributor Author

ndkeen commented Jun 25, 2024

merged to next

@ndkeen
Copy link
Contributor Author

ndkeen commented Jun 27, 2024

Last night cdash runs look ok. Will bless the 3 tests that have NML diffs (those using hcru_hcru)

@ndkeen ndkeen merged commit 07f842a into master Jun 27, 2024
21 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-cpu-pelayout-minor-adjustment branch June 27, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Machine Files pm-cpu Perlmutter at NERSC (CPU-only nodes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants