-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error upon invoking container image (failed with rc=-1) #135
Comments
Could you look at the |
May I ask that command should run from login node or another compute node? [siavoa01@bigpurple-ln2 ~]$ srun -p superpod -t 00:60:00 --mem=50G --container-image ubuntu:22.04 hostname The /cm/local/apps/cmd/scripts/taskprolog exists and accessible |
On the slurmd on the node not much extra info is available: [2024-03-11T14:12:35.028] [34335866.extern] task/cgroup: _memcg_initialize: job: alloc=51200MB mem.limit=51200MB memsw.limit=51200MB job_swappiness=18446744073709551614 |
Is this NVIDIA Base Command Manager? |
Thanks I will keep you updated. |
Hi,
upon invoking the container image as for instance (enroot 3.4.1):
[siavoa01@bigpurple-ln3 superpod]$ srun --container-image ./ubuntu.sqsh -t 00:60:00 --cpus-per-task=20 --tasks-per-node=1 --gpus-per-task=8 --mem=100G --pty bash
srun: job 34262460 queued and waiting for resources
srun: job 34262460 has been allocated resources
slurmstepd-sp-0016: error: pyxis: couldn't start container
slurmstepd-sp-0016: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd-sp-0016: error: Failed to invoke spank plugin stack
srun: error: sp-0016: task 0: Exited with exit code 1
would you please attend to this issue? Thank you.
The text was updated successfully, but these errors were encountered: