An Ansible role to install the NVIDIA driver from the NVIDIA CUDA repositories.
In the process of installing the NVIDIA driver, this role will reboot the nodes where it runs.
Because of this, we strongly recommend that you run ansible-playbook
from a separate node than the GPU nodes where you are installing the driver.
If you attempt to run Ansible on the same node where you are installing the driver, this role will either:
- Refuse to proceed with an error like
Running reboot with local connection would reboot the control node
(if running with thelocal
connection) - Reboot the node you're running on, interrupting the playbook execution! (if running the an
ssh
connection against localhost)
This role can be installed using Ansible Galaxy:
$ ansible-galaxy install nvidia.nvidia_driver
Variable | Default value | Description |
---|---|---|
nvidia_driver_package_state |
"present" |
Package state for NVIDIA driver packages |
nvidia_driver_package_version |
"" |
Package version to install. Note that this should match the actual version of the deb or RPM package to be installed. |
nvidia_driver_persistence_mode_on |
yes |
Whether to enable persistence mode (boolean) |
nvidia_driver_skip_reboot |
no |
Whether to skip rebooting the node during the install |
nvidia_driver_module_file |
"/etc/modprobe.d/nvidia.conf" |
Filename to use for NVIDIA driver parameters |
nvidia_driver_module_params |
"" |
Parameters to pass to the NVIDIA driver |
Variable | Default value | Description |
---|---|---|
epel_package |
"https://dl.fedoraproject.org/pub/epel/epel-release-latest-{{ ansible_distribution_major_version }}.noarch.rpm" |
Package to install to enable EPEL |
nvidia_driver_rhel_cuda_repo_baseurl |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _rhel_repo_dir }}/" |
Base URL to use for CUDA repo |
nvidia_driver_rhel_cuda_repo_gpgkey |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _rhel_repo_dir }}/7fa2af80.pub" |
GPG key for the CUDA repo |
For Ubuntu installs, you have the choice of installing from the Canonical repositories and the NVIDIA CUDA repositories.
By default, the Canonical repositories will be used, and the driver installed will be the headless server driver.
Variable | Default value | Description |
---|---|---|
nvidia_driver_ubuntu_install_from_cuda_repo |
no |
Flag whether to use the CUDA repo |
nvidia_driver_ubuntu_branch |
450 |
Driver branch to use for the install |
nvidia_driver_ubuntu_packages |
["nvidia-headless-450-server", "nvidia-headless-450-utils"] |
Package names to install from Canonical repo |
nvidia_driver_ubuntu_cuda_repo_baseurl |
"http://developer.download.nvidia.com/compute/cuda/repos/{{ _ubuntu_repo_dir }}" |
Base URL to use for CUDA repo |
nvidia_driver_ubuntu_cuda_repo_gpgkey_url |
"https://developer.download.nvidia.com/compute/cuda/repos/{{ _ubuntu_repo_dir }}/7fa2af80.pub" |
GPG key for the CUDA repo |
nvidia_driver_ubuntu_cuda_repo_gpgkey_id |
"7fa2af80" |
GPG key ID for the CUDA repo |
nvidia_driver_ubuntu_cuda_package |
"cuda-drivers" |
Package name to install from CUDA repo |
Edit the inventory.ini to add the GPU nodes.
- hosts: gpu_nodes
roles:
- nvidia.nvidia_driver
Save this in play.yml and use the following command to run the play book:
ansible-playbook -i env/invenstory.ini -u root play.yml `
Run this play to test the installation:
ansible-playbook -i env/inventory.ini -u root tests/playbook.yml`
Currently, this role supports the following Linux distributions:
- NVIDIA DGX OS 4
- NVIDIA DGX OS 5
- Ubuntu 18.04 LTS
- Ubuntu 20.04 LTS
- CentOS 7
- Red Hat Enterprise Linux 7