Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers
To configure a custom Debian repository, define slurm_configure_repos: true
.
Then, define the APT repositories with the URL to the GPG key.
# Example apt repository
slurm_apt_repository: "deb [trusted=yes] http://127.0.0.1/ubuntu/22.04/amd64/ ./"
# Example GPG key
slurm_gpg_key: 'http://127.0.0.1/ubuntu/22.04/amd64/GPG-KEY-slurm'
Define slurm_apt_priority
to pin the priority of the repository (APT only). This is optional.
slurm_apt_priority: 900
All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and
create a slurm.conf
with a single localhost
node and debug
partition.
See the defaults and example playbooks for examples.
For the various roles a slurm node can play, you can either set group names, or add values to a list, slurm_roles
.
- group slurmservers or
slurm_roles: ['controller']
- group slurmexechosts or
slurm_roles: ['exec']
- group slurmdbdservers or
slurm_roles: ['dbd']
General config options for slurm.conf go in slurm_config
, a hash. Keys are Slurm config option names.
Partitions and nodes go in slurm_partitions
and slurm_nodes
, lists of hashes. The only required key in the hash is
name
, which becomes the PartitionName
or NodeName
for that line. All other keys/values are placed on to the line
of that partition or node.
Options for the additional configuration files acct_gather.conf,
cgroup.conf, gres.conf
and job_container.conf may be specified in the
slurm_acct_gather_config
, slurm_cgroup_config
(both of them hashes), slurm_gres_config
(list of hashes) and
slurm_job_container_config
(hashes) respectively.
Set slurm_upgrade
to true to upgrade the installed Slurm packages.
You can use slurm_user
(a hash) and slurm_create_user
(a bool) to pre-create a Slurm user so that uids match.
Note that this role requires root access, so enable become
either globally in your playbook / on the commandline or
just for the role like shown below.
None.
Minimal setup, all services on one node:
- name: Slurm all in One
hosts: all
vars:
slurm_roles: ['controller', 'exec', 'dbd']
roles:
- role: galaxyproject.slurm
become: True
More extensive example:
- name: Slurm execution hosts
hosts: all
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_cgroup_config:
CgroupMountpoint: "/sys/fs/cgroup"
CgroupAutomount: yes
ConstrainCores: yes
TaskAffinity: no
ConstrainRAMSpace: yes
ConstrainSwapSpace: no
ConstrainDevices: no
AllowedRamSpace: 100
AllowedSwapSpace: 0
MaxRAMPercent: 100
MaxSwapPercent: 100
MinRAMSpace: 30
slurm_config:
AccountingStorageType: "accounting_storage/none"
ClusterName: cluster
GresTypes: gpu
JobAcctGatherType: "jobacct_gather/none"
MpiDefault: none
ProctrackType: "proctrack/cgroup"
ReturnToService: 1
SchedulerType: "sched/backfill"
SelectType: "select/cons_res"
SelectTypeParameters: "CR_Core"
SlurmctldHost: "slurmctl"
# Use a list to configure master and backups Slurmctld hosts
# SlurmctldHost: ['slurmctl1', 'slurmctl2']
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmctldPidFile: "/var/run/slurmctld.pid"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdPidFile: "/var/run/slurmd.pid"
SlurmdSpoolDir: "/var/spool/slurmd"
StateSaveLocation: "/var/spool/slurmctld"
SwitchType: "switch/none"
TaskPlugin: "task/affinity,task/cgroup"
TaskPluginParam: Sched
slurm_create_user: yes
slurm_gres_config:
- File: /dev/nvidia[0-3]
Name: gpu
NodeName: gpu[01-10]
Type: tesla
slurm_munge_key: "../../../munge.key"
slurm_nodes:
- name: "gpu[01-10]"
CoresPerSocket: 18
Gres: "gpu:tesla:4"
Sockets: 2
ThreadsPerCore: 2
slurm_partitions:
- name: gpu
Default: YES
MaxTime: UNLIMITED
Nodes: "gpu[01-10]"
slurm_roles: ['exec']
slurm_user:
comment: "Slurm Workload Manager"
gid: 888
group: slurm
home: "/var/lib/slurm"
name: slurm
shell: "/usr/sbin/nologin"
uid: 888
MIT