-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLURM: specifying extra arguments for GPU binding #436
Comments
So it doesn't look like there's great handling of gpu's on the slurm adapter at the moment, despite there being a hook for adding the gpus=.. bit to the header which I think passes through on the steps' 'gpus: ' key along side nodes/procs/etc. Looks like the only extra one explicitly supported is 'cores per task'. Also note these are decoupled a bit in the script adapters: the header applies to the entire batch job (along with with the batch block keys), while many of the keys attached to the step get applied independently to each srun when using the $(LAUNCHER) syntax which has some limited support for independently specifying procs/nodes per launcher invocation. And just to better understand the final use case, are you also looking for having say 4 different tasks (or srun's) inside this step, one per gpu, or perhaps preferring to keep each one separate and pack the allocation with many jobs using an embedded flux instance? Either way it looks like we'll need to wire up some extra hooks/pass through for these gpu related args in the slurm adapter. Think we could also add some 'c', and 'g' flags to the new launcher syntax if you want more independent control of multiple $(LAUNCHER) tokens in a step (see this new style launcher) |
The use case for this job step is just a single MPI job across 1+ nodes. (Other The somewhat nonstandard options are just to get the right mapping of NUMA domains to GPUs due to the weird topology on this system, plus a workaround to avoid cgroup resource isolation being applied to the GPUs (since that prevents CUDA IPC from working between MPI ranks). Using
might accomplish the same binding, but I haven't tested that yet. Is there a built-in way to specify this alternative set of SLURM options? |
No, it doesn't look like there's a better way built in to set any extra/un-known sbatch options than what you're currently doing by putting them at the top of your step cmd. Will have to look into exposing more of these options/bindings across the script adapters. Other than the options in your initial snippet, are there any other of the numerous sbatch/srun options you would be interested in? |
Not that I can think of. The above examples should cover it. |
Is there a recommended way to specify extra SLURM options for GPU bindings?
I tried using the
args:
batch block key (https://maestrowf.readthedocs.io/en/latest/Maestro/scheduling.html), but the options did not get propagated to the *.sh job script.Following #340, the workaround I've used so far is to specify these options as part of the run command so that they get copied into the job script:
The text was updated successfully, but these errors were encountered: