Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This mostly works, but it is not production-ready. Key things:
podman now won't allow you (by default at least) to use unqualified image names. Fixed by changing to fully-qualified names (and fixing the arcus registry to mirror docker.io)
couldn't reset the podman database after adding the tmp directory config (see roles/podman/config.yml) - fixed by simply not doing these tasks, I haven't convinced myself this is OK. The original code was specifying
tmp_dir
, which is documented in containers.conf as having to be on a tmpfs, but that isn't mentioned in podman docs. The rootless tutorial doesn't mention it but states that "$XDG_RUNTIME_DIRdefaults on most systems to /run/user/$UID", which doesn't exist (with the containers up) for thepodman
user (and woudn't be a tmpfs).podman info
showsboth of which are mounted on
/
podman systemd units complain:
the openhpc role has its own PR: Support RockyLinux 9 ansible-role-openhpc#164. There is some incomplete stuff here (e.g. this PR won't work on RL8) but it also needs the "generic slurm" PR merging so we can define cgroups.conf properly which appears to be necessary. Really I'd like to move the plugin defaults to use cgroups too, but this doens't work in a container (although see OpenHPC slack for a possible workaround).
monitoring.yml fails b/c there's no
prometheus-slurm-exporter
build for RL9. This is our repo, so it'd presumably be an easy fix.