Skip to content

Commit

Permalink
Merge pull request #94 from gkaf89/refactor/jobs-gpu
Browse files Browse the repository at this point in the history
[REFACTOR] Use long form flags in the GPU job example script
  • Loading branch information
gkaf89 authored Nov 26, 2024
2 parents 55671bb + 48858be commit 3aca179
Showing 1 changed file with 20 additions and 15 deletions.
35 changes: 20 additions & 15 deletions docs/jobs/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,36 @@ NVlink was designed as an alternative solution to PCI Express with higher bandwi
Because of the hardware organization, you **MUST** follow the below recommendations:

1. **Do not run jobs on GPU nodes if you have no use of GPU accelerators!**, _i.e._ if you are not using any of the software compiled against the `{foss,intel}cuda` toolchain.
2. Avoid using more than 4 GPUs, ideally within the same node
2. Avoid using more than 4 GPUs, ideally within the same node.
3. Dedicated 1/4 of the available CPU cores for the management of each GPU card reserved.

Thus your typical GPU launcher would match the [AI/DL launcher](../slurm/launchers.md#specialized-bigdatagpu-launchers) example:

```bash
#!/bin/bash -l
#!/usr/bin/bash --login

#SBATCH --job-name=gpu_example
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.out

### Request one GPU tasks for 4 hours - dedicate 1/4 of available cores for its management
#SBATCH -N 1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH -c 7
#SBATCH -G 1
#SBATCH --time=04:00:00
#SBATCH -p gpu
#SBATCH --cpus-per-task=7
#SBATCH --gpus-per-task=1
#SBATCH --time=0-04:00:00

### Submit to the `gpu` partition of Iris
#SBATCH --parition=gpu
#SBATCH --qos=normal

print_error_and_exit() { echo "***ERROR*** $*"; exit 1; }
module purge || print_error_and_exit "No 'module' command"
module load numlib/cuDNN # Example with cuDNN

module purge || print_error_and_exit "No 'module' command available"
module load numlib/cuDNN # Example using the cuDNN module

[...]
```






You can quickly access a GPU node for [interactive jobs](../jobs/interactive.md) using `si-gpu`.
!!! info "Interactive jobs"
In the UL HPC systems you can use the `si-gpu`, a wrapper for the `salloc` command, that allocates [interactive job](../jobs/interactive.md) in a GPU node with sensible default options.

0 comments on commit 3aca179

Please sign in to comment.