Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: PoC for RL9 #323

Closed
wants to merge 9 commits into from
Closed

DO NOT MERGE: PoC for RL9 #323

wants to merge 9 commits into from

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Oct 25, 2023

This mostly works, but it is not production-ready. Key things:

  • podman now won't allow you (by default at least) to use unqualified image names. Fixed by changing to fully-qualified names (and fixing the arcus registry to mirror docker.io)

  • couldn't reset the podman database after adding the tmp directory config (see roles/podman/config.yml) - fixed by simply not doing these tasks, I haven't convinced myself this is OK. The original code was specifying tmp_dir, which is documented in containers.conf as having to be on a tmpfs, but that isn't mentioned in podman docs. The rootless tutorial doesn't mention it but states that "$XDG_RUNTIME_DIRdefaults on most systems to /run/user/$UID", which doesn't exist (with the containers up) for the podman user (and woudn't be a tmpfs).

    podman info shows

      graphRoot: /var/lib/podman/.local/share/containers/storage
    ...
      runRoot: /tmp/containers-user-1001/containers
    

both of which are mounted on /

  • podman systemd units complain:

    time="2023-10-13T13:57:06Z" level=warning msg="The cgroupv2 manager is set to systemd but there is no systemd user session available"
    time="2023-10-13T13:57:06Z" level=warning msg="For using systemd, you may need to login using an user session"
    time="2023-10-13T13:57:06Z" level=warning msg="Alternatively, you can enable lingering with: `loginctl enable-linger 1001` (possibly as root)"
    time="2023-10-13T13:57:06Z" level=warning msg="Falling back to --cgroup-manager=cgroupfs"
    time="2023-10-13T13:57:06Z" level=error msg="unlinkat /run/podman/libpod/tmp: permission denied"
    
  • the openhpc role has its own PR: Support RockyLinux 9 ansible-role-openhpc#164. There is some incomplete stuff here (e.g. this PR won't work on RL8) but it also needs the "generic slurm" PR merging so we can define cgroups.conf properly which appears to be necessary. Really I'd like to move the plugin defaults to use cgroups too, but this doens't work in a container (although see OpenHPC slack for a possible workaround).

  • monitoring.yml fails b/c there's no prometheus-slurm-exporter build for RL9. This is our repo, so it'd presumably be an easy fix.

pip:
name: pymysql
name:
- pymysql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we should pin these version somewhat?

@sjpb sjpb mentioned this pull request Jan 24, 2024
@sjpb
Copy link
Collaborator Author

sjpb commented Jan 24, 2024

Replaced by #353

@sjpb sjpb closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants