Skip to content

Ansible role for installing and managing the Slurm Workload Manager

Notifications You must be signed in to change notification settings

mila-iqia/ansible-slurm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Slurm

Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers To configure a custom Debian repository, define slurm_configure_repos: true.

Then, define the APT repositories with the URL to the GPG key.

# Example apt repository
slurm_apt_repository: "deb [trusted=yes] http://127.0.0.1/ubuntu/22.04/amd64/  ./"
# Example GPG key
slurm_gpg_key: 'http://127.0.0.1/ubuntu/22.04/amd64/GPG-KEY-slurm'

Define slurm_apt_priority to pin the priority of the repository (APT only). This is optional.

slurm_apt_priority: 900

Role Variables

All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and create a slurm.conf with a single localhost node and debug partition. See the defaults and example playbooks for examples.

For the various roles a slurm node can play, you can either set group names, or add values to a list, slurm_roles.

  • group slurmservers or slurm_roles: ['controller']
  • group slurmexechosts or slurm_roles: ['exec']
  • group slurmdbdservers or slurm_roles: ['dbd']

General config options for slurm.conf go in slurm_config, a hash. Keys are Slurm config option names.

Partitions and nodes go in slurm_partitions and slurm_nodes, lists of hashes. The only required key in the hash is name, which becomes the PartitionName or NodeName for that line. All other keys/values are placed on to the line of that partition or node.

Options for the additional configuration files acct_gather.conf, cgroup.conf, gres.conf and job_container.conf may be specified in the slurm_acct_gather_config, slurm_cgroup_config (both of them hashes), slurm_gres_config (list of hashes) and slurm_job_container_config (hashes) respectively.

Set slurm_upgrade to true to upgrade the installed Slurm packages.

You can use slurm_user (a hash) and slurm_create_user (a bool) to pre-create a Slurm user so that uids match.

Note that this role requires root access, so enable become either globally in your playbook / on the commandline or just for the role like shown below.

Dependencies

None.

Example Playbooks

Minimal setup, all services on one node:

- name: Slurm all in One
  hosts: all
  vars:
    slurm_roles: ['controller', 'exec', 'dbd']
  roles:
    - role: galaxyproject.slurm
      become: True

More extensive example:

- name: Slurm execution hosts
  hosts: all
  roles:
    - role: galaxyproject.slurm
      become: True
  vars:
    slurm_cgroup_config:
      CgroupMountpoint: "/sys/fs/cgroup"
      CgroupAutomount: yes
      ConstrainCores: yes
      TaskAffinity: no
      ConstrainRAMSpace: yes
      ConstrainSwapSpace: no
      ConstrainDevices: no
      AllowedRamSpace: 100
      AllowedSwapSpace: 0
      MaxRAMPercent: 100
      MaxSwapPercent: 100
      MinRAMSpace: 30
    slurm_config:
      AccountingStorageType: "accounting_storage/none"
      ClusterName: cluster
      GresTypes: gpu
      JobAcctGatherType: "jobacct_gather/none"
      MpiDefault: none
      ProctrackType: "proctrack/cgroup"
      ReturnToService: 1
      SchedulerType: "sched/backfill"
      SelectType: "select/cons_res"
      SelectTypeParameters: "CR_Core"
      SlurmctldHost: "slurmctl"
      # Use a list to configure master and backups Slurmctld hosts
      # SlurmctldHost: ['slurmctl1', 'slurmctl2']
      SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
      SlurmctldPidFile: "/var/run/slurmctld.pid"
      SlurmdLogFile: "/var/log/slurm/slurmd.log"
      SlurmdPidFile: "/var/run/slurmd.pid"
      SlurmdSpoolDir: "/var/spool/slurmd"
      StateSaveLocation: "/var/spool/slurmctld"
      SwitchType: "switch/none"
      TaskPlugin: "task/affinity,task/cgroup"
      TaskPluginParam: Sched
    slurm_create_user: yes
    slurm_gres_config:
      - File: /dev/nvidia[0-3]
        Name: gpu
        NodeName: gpu[01-10]
        Type: tesla
    slurm_munge_key: "../../../munge.key"
    slurm_nodes:
      - name: "gpu[01-10]"
        CoresPerSocket: 18
        Gres: "gpu:tesla:4"
        Sockets: 2
        ThreadsPerCore: 2
    slurm_partitions:
      - name: gpu
        Default: YES
        MaxTime: UNLIMITED
        Nodes: "gpu[01-10]"
    slurm_roles: ['exec']
    slurm_user:
      comment: "Slurm Workload Manager"
      gid: 888
      group: slurm
      home: "/var/lib/slurm"
      name: slurm
      shell: "/usr/sbin/nologin"
      uid: 888

License

MIT

Author Information

View contributors on GitHub

About

Ansible role for installing and managing the Slurm Workload Manager

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jinja 100.0%