SLURM: specifying extra arguments for GPU binding #436

BenWibking · 2024-01-29T20:51:48Z

Is there a recommended way to specify extra SLURM options for GPU bindings?

I tried using the args: batch block key (https://maestrowf.readthedocs.io/en/latest/Maestro/scheduling.html), but the options did not get propagated to the *.sh job script.

Following #340, the workaround I've used so far is to specify these options as part of the run command so that they get copied into the job script:

    - name: run-sim
      description: Run the simulation.
      run:
          cmd: |
              #SBATCH --mem=0
              #SBATCH --constraint="scratch"
              #SBATCH --ntasks-per-node=4
              #SBATCH --cpus-per-task=16
              #SBATCH --gpus-per-task=1
              #SBATCH --gpu-bind=none

              srun bash -c "
                  export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID));
                  $(BINARY_PATH) -i $(generate-infile.workspace)/params.in" > logfile.txt
          depends: [generate-infile]
          nodes: 1
          exclusive: True
          walltime: "00:10:00"

The text was updated successfully, but these errors were encountered:

jwhite242 · 2024-01-29T23:31:50Z

So it doesn't look like there's great handling of gpu's on the slurm adapter at the moment, despite there being a hook for adding the gpus=.. bit to the header which I think passes through on the steps' 'gpus: ' key along side nodes/procs/etc. Looks like the only extra one explicitly supported is 'cores per task'. Also note these are decoupled a bit in the script adapters: the header applies to the entire batch job (along with with the batch block keys), while many of the keys attached to the step get applied independently to each srun when using the $(LAUNCHER) syntax which has some limited support for independently specifying procs/nodes per launcher invocation.

And just to better understand the final use case, are you also looking for having say 4 different tasks (or srun's) inside this step, one per gpu, or perhaps preferring to keep each one separate and pack the allocation with many jobs using an embedded flux instance? Either way it looks like we'll need to wire up some extra hooks/pass through for these gpu related args in the slurm adapter. Think we could also add some 'c', and 'g' flags to the new launcher syntax if you want more independent control of multiple $(LAUNCHER) tokens in a step (see this new style launcher)

BenWibking · 2024-01-30T00:09:25Z

The use case for this job step is just a single MPI job across 1+ nodes. (Other ~~job steps~~ workflow steps are CPU-only, so they need to have a different binding/ntasks-per-node, but for now, that's a separate issue.)

The somewhat nonstandard options are just to get the right mapping of NUMA domains to GPUs due to the weird topology on this system, plus a workaround to avoid cgroup resource isolation being applied to the GPUs (since that prevents CUDA IPC from working between MPI ranks).

Using

#SBATCH --gpu-bind=none
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-node=4

might accomplish the same binding, but I haven't tested that yet. Is there a built-in way to specify this alternative set of SLURM options?

jwhite242 · 2024-01-30T01:21:29Z

No, it doesn't look like there's a better way built in to set any extra/un-known sbatch options than what you're currently doing by putting them at the top of your step cmd.

Will have to look into exposing more of these options/bindings across the script adapters. Other than the options in your initial snippet, are there any other of the numerous sbatch/srun options you would be interested in?

BenWibking · 2024-01-30T01:51:37Z

No, it doesn't look like there's a better way built in to set any extra/un-known sbatch options than what you're currently doing by putting them at the top of your step cmd.

Will have to look into exposing more of these options/bindings across the script adapters. Other than the options in your initial snippet, are there any other of the numerous sbatch/srun options you would be interested in?

Not that I can think of. The above examples should cover it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLURM: specifying extra arguments for GPU binding #436

SLURM: specifying extra arguments for GPU binding #436

BenWibking commented Jan 29, 2024

jwhite242 commented Jan 29, 2024

BenWibking commented Jan 30, 2024 •

edited

Loading

jwhite242 commented Jan 30, 2024

BenWibking commented Jan 30, 2024

SLURM: specifying extra arguments for GPU binding #436

SLURM: specifying extra arguments for GPU binding #436

Comments

BenWibking commented Jan 29, 2024

jwhite242 commented Jan 29, 2024

BenWibking commented Jan 30, 2024 • edited Loading

jwhite242 commented Jan 30, 2024

BenWibking commented Jan 30, 2024

BenWibking commented Jan 30, 2024 •

edited

Loading