Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
deprecated.md		deprecated.md

README.md

Nvida Driver Management (CUDA Toolkit 12.3 Update 1 and higher)

Clean up old CUDA & Nvidia driver (prefered)

Check if there's any hardware failure, i.e. gpu lost, by listing all gpus:
```
lspci | grep -i nvidia
```

Purge CUDA & nvidia driver:

sudo apt --purge remove "*cublas*" "cuda*" "nsight*" "*cudnn*" "libnvidia*" -y
sudo apt remove --purge '^nvidia-.*' -y
# (Optional, Prefer) Remove all CUDA to avoid possible confilication with new driver
sudo rm -rf /usr/local/cuda*
sudo apt --purge autoremove -y
sudo apt autoclean

Check if there is any leftover and remove them:

dpkg -l | grep -i nvidia
dpkg -l | grep nvidia-driver
sudo apt --purge remove {some-pkg}

Tip

Besides /usr/local/cuda*, make sure to check $CUDA_HOME, $PATH and $LD_LIBRARY_PATH for non-default CUDA installation path.

Install CUDA Toolkit, nvidia driver and cuDNN

Important

Read this offical blog to check which flavor of driver does your gpu need first.

Check CUDA Toolkit Archive List to find prefered version and follow the instructions.
- If you want versions lower than 12.3, then check this doc.
Check cuDNN Archive List to find prefered version and follow the instuctions.

An example if you use Ubuntu 24.04(x86_64) and deb(network) for CUDA 12.5 Update 1 installation:

Base Installer:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-5 -y

Driver Installer, open kernel module flavor:

sudo apt install nvidia-driver-555-open -y
sudo apt install cuda-drivers-555 -y

Driver Installer, legacy kernel module flavor:

sudo apt install cuda-drivers -y

Set CUDA PATH and LD_LIBRARY_PATH to your [ba/z]shrc:

# set cuda path if nvidia-smi works
if command -v nvidia-smi &>/dev/null; then
  [[ ":$path:" == *":/usr/local/cuda/bin:"* ]] ||
          export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
  [[ ":$LD_LIBRARY_PATH:" == *":/usr/local/cuda/lib64:"* ]] ||
          export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi

Reload the system path:
```
source ~/.[ba/z]shrc
```

Reboot and check gpu info with nvidia-smi. (prefered)

If you want to reload gpu mods while keeping machine alive, use following commands.

# Change to CLI only mode
sudo systemctl isolate multi-user.target
# Kill processes using nvidia devices if any
sudo lsof /dev/nvidia*
sudo lsof -t /dev/nvidia* | xargs sudo kill -9
# Remove nvidia module
# "rmmod" shows the dependencies, remove them recursively and manually with "sudo rmmod sth"
sudo rmmod nvidia
# Reload nvidia module
sudo modprobe nvidia
# Set to default target
sudo systemctl default

If nvidia-modprobe is broken or missing, fix it via following commands:

# Install nvidia-modprobe
sudo apt install nvidia-modprobe
# Check nvidia-modprobe version
nvidia-modprobe -v

Caution

However if you manually reload nvidia-mod, it may sriously slow down the gpu process until you reboot your machine.

Post-installation Actions

Note

Persistence Daemon is prefered than Persistence Mode and enabled by default after nvida driver R319.

Note

You might need to install third-party libraries for compiling cuda-samples, check here to install the libs.

Check if Persistence Daemon is active:

sudo systemctl status nvidia-persistenced.service

If it's active, it should shows:(click-to-expand)

● nvidia-persistenced.service - NVIDIA Persistence Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; static)
   Active: active (running) since Sun 2024-07-21 07:31:57 UTC; 5h 53min ago
 Main PID: 1331 (nvidia-persiste)
    Tasks: 1 (limit: 38220)
   Memory: 368.0K (peak: 844.0K)
      CPU: 1ms
   CGroup: /system.slice/nvidia-persistenced.service
           └─1331 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose

Jul 21 07:31:57 {hostname} systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Verbose syslog connection opened
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Now running with user ID 116 and group ID 120
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Started (1331)
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: device 0000:01:00.0 - registered
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: device 0000:02:00.0 - registered
Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Local RPC services initialized
Jul 21 07:31:57 {hostname} systemd[1]: Started nvidia-persistenced.service - NVIDIA Persistence Daemon.

If it's not active, enable the daemon via:

sudo systemctl enable nvidia-persistenced.service
sudo systemctl start nvidia-persistenced.service

Verify the installation:
- For CUDA Toolkit, follow the steps for cuda-samples.
- For cuDNN, follow the steps in the doc.

Tips, Tricks and Tools

References

CUDA Toolkit:
cuDNN
GitHub repo
- NVIDIA/cuda-samples
- NVIDIA/nvidia-persistenced
Docs and Blogs
- GPU Management and Deployment
  - Driver Persistence
- NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia-driver

nvidia-driver

README.md

Nvida Driver Management (CUDA Toolkit 12.3 Update 1 and higher)

Clean up old CUDA & Nvidia driver (prefered)

Install CUDA Toolkit, nvidia driver and cuDNN

Post-installation Actions

Tips, Tricks and Tools

References

Files

nvidia-driver

Directory actions

More options

Directory actions

More options

Latest commit

History

nvidia-driver

Folders and files

parent directory

README.md

Nvida Driver Management (CUDA Toolkit 12.3 Update 1 and higher)

Clean up old CUDA & Nvidia driver (prefered)

Install CUDA Toolkit, nvidia driver and cuDNN

Post-installation Actions

Tips, Tricks and Tools

References