Slurm

Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers To configure a custom Debian repository, define slurm_configure_repos: true.

Then, define the APT repositories with the URL to the GPG key.

# Example apt repository
slurm_apt_repository: "deb [trusted=yes] http://127.0.0.1/ubuntu/22.04/amd64/  ./"
# Example GPG key
slurm_gpg_key: 'http://127.0.0.1/ubuntu/22.04/amd64/GPG-KEY-slurm'

Define slurm_apt_priority to pin the priority of the repository (APT only). This is optional.

slurm_apt_priority: 900

Role Variables

All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and create a slurm.conf with a single localhost node and debug partition. See the defaults and example playbooks for examples.

For the various roles a slurm node can play, you can either set group names, or add values to a list, slurm_roles.

group slurmservers or slurm_roles: ['controller']
group slurmexechosts or slurm_roles: ['exec']
group slurmdbdservers or slurm_roles: ['dbd']

General config options for slurm.conf go in slurm_config, a hash. Keys are Slurm config option names.

Partitions and nodes go in slurm_partitions and slurm_nodes, lists of hashes. The only required key in the hash is name, which becomes the PartitionName or NodeName for that line. All other keys/values are placed on to the line of that partition or node.

Options for the additional configuration files acct_gather.conf, cgroup.conf, gres.conf and job_container.conf may be specified in the slurm_acct_gather_config, slurm_cgroup_config (both of them hashes), slurm_gres_config (list of hashes) and slurm_job_container_config (hashes) respectively.

Set slurm_upgrade to true to upgrade the installed Slurm packages.

You can use slurm_user (a hash) and slurm_create_user (a bool) to pre-create a Slurm user so that uids match.

Note that this role requires root access, so enable become either globally in your playbook / on the commandline or just for the role like shown below.

Dependencies

None.

Example Playbooks

Minimal setup, all services on one node:

- name: Slurm all in One
  hosts: all
  vars:
    slurm_roles: ['controller', 'exec', 'dbd']
  roles:
    - role: galaxyproject.slurm
      become: True

More extensive example:

- name: Slurm execution hosts
  hosts: all
  roles:
    - role: galaxyproject.slurm
      become: True
  vars:
    slurm_cgroup_config:
      CgroupMountpoint: "/sys/fs/cgroup"
      CgroupAutomount: yes
      ConstrainCores: yes
      TaskAffinity: no
      ConstrainRAMSpace: yes
      ConstrainSwapSpace: no
      ConstrainDevices: no
      AllowedRamSpace: 100
      AllowedSwapSpace: 0
      MaxRAMPercent: 100
      MaxSwapPercent: 100
      MinRAMSpace: 30
    slurm_config:
      AccountingStorageType: "accounting_storage/none"
      ClusterName: cluster
      GresTypes: gpu
      JobAcctGatherType: "jobacct_gather/none"
      MpiDefault: none
      ProctrackType: "proctrack/cgroup"
      ReturnToService: 1
      SchedulerType: "sched/backfill"
      SelectType: "select/cons_res"
      SelectTypeParameters: "CR_Core"
      SlurmctldHost: "slurmctl"
      # Use a list to configure master and backups Slurmctld hosts
      # SlurmctldHost: ['slurmctl1', 'slurmctl2']
      SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
      SlurmctldPidFile: "/var/run/slurmctld.pid"
      SlurmdLogFile: "/var/log/slurm/slurmd.log"
      SlurmdPidFile: "/var/run/slurmd.pid"
      SlurmdSpoolDir: "/var/spool/slurmd"
      StateSaveLocation: "/var/spool/slurmctld"
      SwitchType: "switch/none"
      TaskPlugin: "task/affinity,task/cgroup"
      TaskPluginParam: Sched
    slurm_create_user: yes
    slurm_gres_config:
      - File: /dev/nvidia[0-3]
        Name: gpu
        NodeName: gpu[01-10]
        Type: tesla
    slurm_munge_key: "../../../munge.key"
    slurm_nodes:
      - name: "gpu[01-10]"
        CoresPerSocket: 18
        Gres: "gpu:tesla:4"
        Sockets: 2
        ThreadsPerCore: 2
    slurm_partitions:
      - name: gpu
        Default: YES
        MaxTime: UNLIMITED
        Nodes: "gpu[01-10]"
    slurm_roles: ['exec']
    slurm_user:
      comment: "Slurm Workload Manager"
      gid: 888
      group: slurm
      home: "/var/lib/slurm"
      name: slurm
      shell: "/usr/sbin/nologin"
      uid: 888

License

MIT

Author Information

View contributors on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github/workflows		.github/workflows
defaults		defaults
files		files
handlers		handlers
meta		meta
tasks		tasks
templates		templates
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slurm

Role Variables

Dependencies

Example Playbooks

License

Author Information

About

Releases 8

Packages

Languages

mila-iqia/ansible-slurm

Folders and files

Latest commit

History

Repository files navigation

Slurm

Role Variables

Dependencies

Example Playbooks

License

Author Information

About

Resources

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages