-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gh/ci eamxx standalone testing #6637
Conversation
|
FYI, the eamxx build fails are due to #6635. These won't affect the standalone testing because it is "sp" (homme isn't supported there). This is part of the reason I am starting with sp only, but there's a second part and it is to do with the complexity of building and executing the other (bigger) test suites, with a weird (unexpected) fail in atm_proc. I will handle those at a later time. |
782baa7
to
cdaa5c4
Compare
Status update: This PR is very close to being ready, in fact, it may pass all tests, but I'd like to revise the container again before we merge. The rationale: The standalone tests are pushing the container very close to its edge in terms of disk space, resulting in unpredictable fails. Plan A: reduce the space inside the container itself first (we have 13+ GB of data, but we need only ~2 GB for the standalone tests). Plan B: reduce the space in the runner and redesign how the container is run. |
What do you mean by "reduce the space in the runner"? |
The runner image (ubuntu-latest) comes preloaded with all sorts of random things that we can delete. Nominally, they guarantee ~15 GB of space, but in reality, there's more than 120 GB on these runners. It's just that a lot of complex stuff is preinstalled. You can view a list of stuff they have on these in docs like this https://github.com/actions/runner-images/blob/ubuntu22/20240922.1/images/ubuntu/Ubuntu2204-Readme.md. See below for the runner image vs the custom image. I propose to make space in the custom image first. If it doesn't work, I will make space outside of it, but that will likely mean I cannot use the custom image as written below... ci:
runs-on: ubuntu-latest <--------------------------------------- runner image
strategy:
fail-fast: false
matrix:
test:
- sp
- opt
container:
image: ghcr.io/e3sm-project/containers-ghci:ghci-0.1.2 <----- custom image |
Aren't the limits set on the container image? Why do we care about the runner? |
I am not sure what you mean... The limit comes from the hardware (bare-metal), which has the runner image (ubuntu-latest), which has the container image (ghci-container). In other words, ghci-container ⊆ ubuntu-latest ⊆ bare-metal. Inside the ghci-container, you can use as much as you want (i.e., you can run a multi-TB sim if there is space...) |
Ah, so we are not fitting on the actual machine where things end up running? I see. Re: container image size, I was thinking gh has a limit in size for images that one can upload, and that's it. Since the runner is not ours, I thought "there's no limit on our end". But yes, while we are not responsible for the runner size, the limit is hardware based. Wow, 150GB for the runner, that's just nonsense. Why do they make them this big? Have you considered asking for alpine instead? We may have to install something at startup though, which may take some time... Alternatively, maybe a basic debian? |
Is this still a draft? |
Yes, I want to rework the container a bit so that the new tests can pass more reliably. The issue is running out of disk space (sometimes) |
@rljacob, this is ready now (see notes below). @bartgol, could you take another look? Some notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor comment, but not really road-blocking.
uses: actions/upload-artifact@v4 | ||
if: ${{ always() }} | ||
with: | ||
name: ${{ matrix.test }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I suggest a name that is a bit longer, so the user knows what they will be downloading? Something like ctest-logs-${{ matrix.test }}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's touch up this in the next iteration of edits. We need to establish some standards, I think :) I can explain why I chose short names everywhere (e.g., gh/ci(...) and gh-standalone/ci(opt)) --- mainly for ease of seeing them, on the actions boards... but maybe that's just a minor personal preference...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with short names for actions names, since they appear at the bottom of the PR page, and long names are problematic. But artifacts are in the action page, and there's plenty of space for a longer name. That said, I think it's very subjective, hence the approval regardless.
url = [email protected]:CFMIP/COSPv2.0.git | ||
branch = CESM_v2.1.4 | ||
url = [email protected]:bartgol/COSPv2.0.git | ||
branch = bartgol/fix-cosp_optical_inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to change this. That branch got merged into the main cosp branch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ready or still needs a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ready because the change hasn't been in SCREAM yet. If you prefer, I can make the change in both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rljacob for reference, this is scream master right now https://github.com/E3SM-Project/scream/blob/a269ef91a3da65ee4e23f4e33736684366046b4d/.gitmodules#L23-L26 (https://github.com/E3SM-Project/scream/blob/master/.gitmodules#L23-L26)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its not really necessary for this PR then no need to change it.
Adds support for running eamxx standalone testing in the gh/ci containers, for now only single-precision tests due to a mix of issues. Addresses two issues that have already been fixed in the scream fork. Adds flexibility to build cprnc in eamxx. [BFB]
Adds support for running eamxx standalone testing in the gh/ci containers, for now only single-precision tests due to a mix of issues. Addresses two issues that have already been fixed in the scream fork. Adds flexibility to build cprnc in eamxx.
[BFB]