Skip to content

Latest commit

 

History

History

nvidia-driver

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Nvida Driver Management (CUDA Toolkit 12.3 Update 1 and higher)

Clean up old CUDA & Nvidia driver (prefered)

  • Check if there's any hardware failure, i.e. gpu lost, by listing all gpus:

    lspci | grep -i nvidia
  • Purge CUDA & nvidia driver:

    sudo apt --purge remove "*cublas*" "cuda*" "nsight*" "*cudnn*" "libnvidia*" -y
    sudo apt remove --purge '^nvidia-.*' -y
    # (Optional, Prefer) Remove all CUDA to avoid possible confilication with new driver
    sudo rm -rf /usr/local/cuda*
    sudo apt --purge autoremove -y
    sudo apt autoclean
  • Check if there is any leftover and remove them:

    dpkg -l | grep -i nvidia
    dpkg -l | grep nvidia-driver
    sudo apt --purge remove {some-pkg}

Tip

Besides /usr/local/cuda*, make sure to check $CUDA_HOME, $PATH and $LD_LIBRARY_PATH for non-default CUDA installation path.

Install CUDA Toolkit, nvidia driver and cuDNN

Important

Read this offical blog to check which flavor of driver does your gpu need first.

  • Check CUDA Toolkit Archive List to find prefered version and follow the instructions.

    • If you want versions lower than 12.3, then check this doc.
  • Check cuDNN Archive List to find prefered version and follow the instuctions.

  • An example if you use Ubuntu 24.04(x86_64) and deb(network) for CUDA 12.5 Update 1 installation:

    • Base Installer:
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt update
    sudo apt install cuda-toolkit-12-5 -y
    • Driver Installer, open kernel module flavor:
    sudo apt install nvidia-driver-555-open -y
    sudo apt install cuda-drivers-555 -y
    • Driver Installer, legacy kernel module flavor:
    sudo apt install cuda-drivers -y
  • Set CUDA PATH and LD_LIBRARY_PATH to your [ba/z]shrc:

    # set cuda path if nvidia-smi works
    if command -v nvidia-smi &>/dev/null; then
      [[ ":$path:" == *":/usr/local/cuda/bin:"* ]] ||
              export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
      [[ ":$LD_LIBRARY_PATH:" == *":/usr/local/cuda/lib64:"* ]] ||
              export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    fi
  • Reload the system path:

    source ~/.[ba/z]shrc
  • Reboot and check gpu info with nvidia-smi. (prefered)

    • If you want to reload gpu mods while keeping machine alive, use following commands.
    # Change to CLI only mode
    sudo systemctl isolate multi-user.target
    # Kill processes using nvidia devices if any
    sudo lsof /dev/nvidia*
    sudo lsof -t /dev/nvidia* | xargs sudo kill -9
    # Remove nvidia module
    # "rmmod" shows the dependencies, remove them recursively and manually with "sudo rmmod sth"
    sudo rmmod nvidia
    # Reload nvidia module
    sudo modprobe nvidia
    # Set to default target
    sudo systemctl default
    • If nvidia-modprobe is broken or missing, fix it via following commands:
    # Install nvidia-modprobe
    sudo apt install nvidia-modprobe
    # Check nvidia-modprobe version
    nvidia-modprobe -v

Caution

However if you manually reload nvidia-mod, it may sriously slow down the gpu process until you reboot your machine.

Post-installation Actions

Note

Persistence Daemon is prefered than Persistence Mode and enabled by default after nvida driver R319.

Note

You might need to install third-party libraries for compiling cuda-samples, check here to install the libs.

  • Check if Persistence Daemon is active:

    sudo systemctl status nvidia-persistenced.service
    If it's active, it should shows:(click-to-expand)
    ● nvidia-persistenced.service - NVIDIA Persistence Daemon
       Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; static)
       Active: active (running) since Sun 2024-07-21 07:31:57 UTC; 5h 53min ago
     Main PID: 1331 (nvidia-persiste)
        Tasks: 1 (limit: 38220)
       Memory: 368.0K (peak: 844.0K)
          CPU: 1ms
       CGroup: /system.slice/nvidia-persistenced.service
               └─1331 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
    
    Jul 21 07:31:57 {hostname} systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Verbose syslog connection opened
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Now running with user ID 116 and group ID 120
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Started (1331)
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: device 0000:01:00.0 - registered
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: device 0000:02:00.0 - registered
    Jul 21 07:31:57 {hostname} nvidia-persistenced[1331]: Local RPC services initialized
    Jul 21 07:31:57 {hostname} systemd[1]: Started nvidia-persistenced.service - NVIDIA Persistence Daemon.
    
    • If it's not active, enable the daemon via:
    sudo systemctl enable nvidia-persistenced.service
    sudo systemctl start nvidia-persistenced.service
  • Verify the installation:

Tips, Tricks and Tools

References