Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-smi segmentation fault in wsl2 but not in Windows #11277

Open
1 of 2 tasks
themizzi opened this issue Mar 9, 2024 · 83 comments
Open
1 of 2 tasks

nvidia-smi segmentation fault in wsl2 but not in Windows #11277

themizzi opened this issue Mar 9, 2024 · 83 comments
Labels

Comments

@themizzi
Copy link

themizzi commented Mar 9, 2024

Windows Version

10.0.22631.3235

WSL Version

2.1.4.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.146.1-2

Distro Version

Ubuntu 22.04

Other Software

GeForce GTX 1650 Ti with GeForce Game Ready Driver version 551.76

Repro Steps

Run nvidia-smi in Windows and get the following:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   63C    P8              3W /   50W |     163MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3268    C+G   ...ekyb3d8bbwe\WsaClient\WsaClient.exe      N/A      |
|    0   N/A  N/A     18112    C+G   ...ience\NVIDIA GeForce Experience.exe      N/A      |
+-----------------------------------------------------------------------------------------+

Run nvidia-smi in wsl2 Ubuntu and get the following:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 551.76       CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
[1]    2058 segmentation fault  nvidia-smi

Expected Behavior

I am expecting no segmentation fault and successful output in WSL 2.

Actual Behavior

I get a segmentation fault in WSL2 as described above.

Diagnostic Logs

No response

Copy link

github-actions bot commented Mar 9, 2024

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@themizzi
Copy link
Author

themizzi commented Mar 9, 2024

Interestingly, glxinfo does not report my nvidia GPU:

❯ glxinfo | grep "Device"
Device: D3D12 (Intel(R) UHD Graphics) (0xffffffff)

@terrificobjects
Copy link

I am also seeing this issue. Interestingly- the nvidia-smi version is different on mine:

NVIDIA-SMI 545.29.06 Driver Version: 551.61 CUDA Version: 12.4

and glxinfo returns the same output as the poster above.

@jaubourg
Copy link

I have the exact same issue.

Environment

WSL version: 2.1.4.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3155

Distro is Ubuntu-22.04.

nvidia-smi in windows gives this:

Tue Mar 12 02:47:51 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti   WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   60C    P0             15W /   50W |       0MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2070 ...  WDDM  |   00000000:40:00.0 Off |                  N/A |
|  0%   52C    P8             23W /  215W |     713MiB /   8192MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    1   N/A  N/A      2360    C+G   ...8.0_x64__cv1g1gvanyjgm\WhatsApp.exe      N/A      |
|    1   N/A  N/A      5168    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|    1   N/A  N/A      8560    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    1   N/A  N/A      8980    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    1   N/A  N/A     13284    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe      N/A      |
|    1   N/A  N/A     15352    C+G   ...Brave-Browser\Application\brave.exe      N/A      |
+-----------------------------------------------------------------------------------------+

But inside Ubuntu under WSL2:

Tue Mar 12 02:47:41 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault

Everything was working fine last week. I didn't install anything new in Ubuntu, I only updated nvidia drivers in Windows and I highly suspect that's the problem. I sadly don't remember which version of the drivers I had before since I hadn't updated for quite some time (not that I know how to downgrade drivers to test my theory).

This is highly blocking, I need CUDA for my daily work.

@jaubourg
Copy link

jaubourg commented Mar 12, 2024

Also, lspci returns this:

3a32:00:00.0 3D controller: Microsoft Corporation Device 008e
3aca:00:00.0 3D controller: Microsoft Corporation Device 008e
b0ae:00:00.0 3D controller: Microsoft Corporation Device 008e
c7e3:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)
cec6:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)

I dunno if this is normal or if the two NVidia cards should be reported as actually NVidia.

@terrificobjects
Copy link

Just to update- I thought this segmentation fault item was causing an issue, but I am using a Tesla P4 and data center drivers. I was unable to use pytorch or anything without rebooting into safe mode, running DDU and clean installing my drivers. I still had issues after reinstalling drivers, so I went into WSL and removed all nvidia and cuda packages, rebooted/DDU/clean reinstalled one more time, and now I can use Cuda like normal. I still see regular nvidia-smi output in Powershell but segmentation fault in WSL- but I can still run all my applications.

Just in case someone misidentifies this as a different issue they are having, like I did.

@jaubourg
Copy link

I use CUDA inside docker images launched within wsl2's Ubuntu and the graphics card are not found while it worked before so the issue is clearly not limited to nvidia-smi in my personal case. Just to be extra-precise too.
And it worked flawlessly before and I didn't change anything inside Ubuntu.

@zcobol
Copy link

zcobol commented Mar 13, 2024

@themizzi if glxinfo shows an Intel Device and you want the Nvidia one, set MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia environment variable and run the command again. Or just run MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia glxinfo -B if you don't want to make it permanent.

glxinfo output using default settings:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (Intel(R) UHD Graphics 750) (0xffffffff)

and with MESA_D3D12_DEFAULT_ADAPTER_NAME=nvidia set:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce RTX 4070 SUPER) (0xffffffff)

Looks like nvidia-smi crashes when using the GDR driver. It doesn't trigger when using the SD driver:

Tue Mar 12 17:24:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   33C    P8              3W /  220W |     540MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

@GitHubUserC
Copy link

I'm running on exactly the same environment and experiencing the same problems as @themizzi .

@mwkldeveloper
Copy link

Screenshot 2024-03-13 150048
the same here: segmentation fault in wsl2 but not in Windows

@elsaco
Copy link

elsaco commented Mar 13, 2024

@themizzi which nvidia-smi are you running inside WSL? The correct one is part of the Nvidia Windows driver installation and is mounted in WSL under /usr/lib/wsl/lib. What is the output of command -v nvidia-smi in WSL?

Output using Game Ready Driver ver. 551.67:

Wed Mar 13 09:48:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   32C    P8              3W /  220W |     431MiB /  12282MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Notice the nvidia-smi version, it's 550.60.01

@OneBlue OneBlue added the GPU label Mar 13, 2024
@Rui-K
Copy link

Rui-K commented Mar 14, 2024

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

@nocturneatfiftyhz
Copy link

Same problem here..
nvidia-smi works fine on Win11, but gives Segmentation fault on WSL2


C:\Users\X>nvidia-smi
Fri Mar 15 01:12:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650      WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8              4W /   50W |       0MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

C:\Users\X>wsl.exe
x@DESKTOP-NHNBGBN:/mnt/c/Users/X$ nvidia-smi
Fri Mar 15 01:12:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
Segmentation fault

@jmangelson
Copy link

I am seeing the same behavior.

Do we know if this is a problem with a specific driver?

@Rui-K
Copy link

Rui-K commented Mar 15, 2024

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

Update: Did nothing, reboot my computer, without trying nvidia-smi in windows, I directly tried it in WSL, worked with no error.

@GitHubUserC
Copy link

same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version

Update: Did nothing, reboot my computer, without trying nvidia-smi in windows, I directly tried it in WSL, worked with no error.

Don't work for me @Rui-K

@jmangelson
Copy link

I still see the segfault.

However, if I run nvidia-smi.exe from within WSL it displays correctly.

Additionally, if I try running programs that use CUDA they do run.

@elsaco
Copy link

elsaco commented Mar 17, 2024

@jaubourg you're just launching a Windows executable from within WSL, which is a PE blob from /mnt/c/Windows/system32/nvidia-smi.exe. The issue is with the Linux version of nvidia-smi which is mounted under /usr/lib/wsl/lib from Windows, however it's a different binary.

[elsaco@texas ~]$ file /mnt/c/Windows/system32/nvidia-smi.exe
/mnt/c/Windows/system32/nvidia-smi.exe: PE32+ executable (console) x86-64, for MS Windows, 7 sections
[elsaco@texas ~]$ file /usr/lib/wsl/lib/nvidia-smi
/usr/lib/wsl/lib/nvidia-smi: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=db77481740c9f1334e47d8f2ffde53b34b2bc0dc, stripped

nvidia-smi utility is not using any of the CUDA libraries. What is the output of lld -v /usr/lib/wsl/lib/nvidia-smi? Are there any unresolved libs?

This is the output on my system:

[elsaco@texas ~]$ ldd -v /usr/lib/wsl/lib/nvidia-smi
        linux-vdso.so.1 (0x00007ffcde975000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fce8e56a000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fce8e489000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fce8e484000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fce8e2a2000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fce8e29d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fce8e577000)

        Version information:
        /usr/lib/wsl/lib/nvidia-smi:
                librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
                libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
                libpthread.so.0 (GLIBC_2.3.2) => /lib64/libpthread.so.0
                libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
                libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libpthread.so.0:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libm.so.6:
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
                libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
        /lib64/libdl.so.2:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libc.so.6:
                ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
        /lib64/librt.so.1:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
                libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6

and both Windows and Linux nvidia-smi work.

@zivchen9993
Copy link

Had the same issue and it kept me up for very long... the thing that fixed it for me was uninstalling the Nvidia driver (which I updated to 551.76 ) and installing an older one (NOT 551.61 which also didn't work), 537.58 from October 2023 in my case but it was pretty random choice.
(I have the MX550 so the drivers correspond to that GPU model).
hope it will help you as well.

@AlexTo
Copy link

AlexTo commented Mar 20, 2024

I think it is the issue with NVIDIA 551 driver. It works for me with the previous NVIDIA 537 but after upgrading, I got segmentation fault in WSL2 as well.

@Triple-Z
Copy link

same, downgrade to 537 solve my problem.

@nocturneatfiftyhz
Copy link

nocturneatfiftyhz commented Mar 20, 2024

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

@GitHubUserC
Copy link

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

same, downgrade to 537 solve my problem.

@asasine
Copy link

asasine commented Apr 8, 2024

Tried a few different versions and it seems like everything 538+ is broken.

  • 537.58: works
  • 537.99: works
  • 538.49: segmentation fault
  • 551.86: segmentation fault

@eyabesbes
Copy link

eyabesbes commented Apr 9, 2024

I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys!

same, downgrade to 537 solve my problem.

Hi,
I'm having the same issue on WSL2 but not on Windows11 how can I downgrade to 537 ?

Tue Apr 9 20:40:24 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
Segmentation fault

@elsaco
Copy link

elsaco commented Apr 9, 2024

@eyabesbes you have to uninstall the current Nvidia Windows driver then install version 537. Nvidia WSL libraries are part of the Windows installer.

Nvidia studio drivers seems to work okay:

[elsaco@texas ~]$ nvidia-smi
Tue Apr  9 15:57:32 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.65                 Driver Version: 551.86         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A4000               On  |   00000000:61:00.0  On |                  Off |
| 41%   34C    P8             10W /  140W |     610MiB /  16376MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

@CaptainRui1000
Copy link

good news: find solution.
bad news: RTX 4080 super has no previous driver to downgrade.
sad

@maxzaikin
Copy link

maxzaikin commented Apr 12, 2024

Having exact same issue here:
nvidia-smi (Windows output)
`Sat Apr 13 04:58:35 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.86 Driver Version: 551.86 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================+======================|
| 0 NVIDIA T1200 Laptop GPU WDDM | 00000000:01:00.0 Off | N/A |
| N/A 50C P0 12W / 45W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+`

nvidia-smi(Ubuntu WSL2)
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.65 Driver Version: 551.86 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===================+========================+======================| Segmentation fault

I have Win-11 Pro with the latest updates.

Does anybody figured-out how to fix this issue? I think it is root cause of that my tensorflow doesn't see GPU... sad

@maxzaikin
Copy link

So far here is my solution:
as of Apr-13-2024 I have downgraded from 551.86-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql (BUILT on CUDA 12.4) down to 536.96-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql (BUILT on CUDA 12.2) and segmentation fault error dissapeared. Seems to me problem is that Windows Ubuntu WSL2 image does not support latest NVIDIA gears.

@kziemski
Copy link

I'm wondering if its a question of maintaining support for the current Ubuntu Version by nvidia/msft and if Ubuntu 24.04 in Wsl works with nvidia driver versions past 538 because that would mean support for CUDA past 12.2. i need to get past 12.2 in order to to match compatibility on jax >=12.3 and other libraries.

if its the case that wsl2/ubuntu 24.04/driver 55x works. then i can think about transitioning to Ubuntu 24 from 22

Are there any WSL people even in this group, I'd love to be able to update my gpu drivers at some point?
is this nvidia's problem!? microsoft's problem!?

@elsaco
Copy link

elsaco commented Jul 19, 2024

In my experience, it's a hit or miss issue depending on the nvidia driver version. With wsl-2.3.11 and nvidia-555.99 it works:

elsaco@eleven:~$ wslinfo --wsl-version
2.3.11
elsaco@eleven:~$ nvidia-smi
Fri Jul 19 13:41:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8              3W /  220W |     576MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

However, there were times when nvidia-smi would segfault. It only takes one update for it to fail again!

@AlexTo
Copy link

AlexTo commented Jul 20, 2024

@kziemski I've been using WSL2 with 54x and 55x versions. I can run Pytorch with CUDA, NVidia Container Toolkit etc.. inside my WSL2 Ubuntu without any issues. My code can utilize CUDA normally. I think if you don't care about nvidia-smi giving errors, you can try to upgrade the drivers to see if it has any impact on your code.

@kziemski
Copy link

@kziemski I've been using WSL2 with 54x and 55x versions. I can run Pytorch with CUDA, NVidia Container Toolkit etc.. inside my WSL2 Ubuntu without any issues. My code can utilize CUDA normally. I think if you don't care about nvidia-smi giving errors, you can try to upgrade the drivers to see if it has any impact on your code.

I think this is bit confusing because i thought this issue was tied to and a common issue with gpu device not being found within wsl2 and docker via wsl2 the last version that worked was 537.58 afterwards running nbody for instance causes nvidia device not found. I've been waiting for a version past 537.58 that will work in ubuntu 22.04.

@AlexTo
Copy link

AlexTo commented Jul 21, 2024

@kziemski I thought so too as the first thing I did after installing WSL2 was to nvidia-smi to check for the presence of the GPU. Turns out, only nvidia-smi gives errors, everything else seems to work normally.

@kziemski
Copy link

@kziemski I've been using WSL2 with 54x and 55x versions. I can run Pytorch with CUDA, NVidia Container Toolkit etc.. inside my WSL2 Ubuntu without any issues. My code can utilize CUDA normally. I think if you don't care about nvidia-smi giving errors, you can try to upgrade the drivers to see if it has any impact on your code.

I think this is bit confusing because i thought this issue was tied to and a common issue with gpu device not being found within wsl2 and docker via wsl2 the last version that worked was 537.58 afterwards running nbody for instance causes nvidia device not found. I've been waiting for a version past 537.58 that will work in ubuntu 22.04.

@kziemski I thought so too as the first thing I did after installing WSL2 was to nvidia-smi to check for the presence of the GPU. Turns out, only nvidia-smi gives errors, everything else seems to work normally.

@AlexTo , alex as of some version 55x.xx it still wasn't working but as of 560.70 it does work.
will stick with this driver for awhile but i think now that i'm in 56x.xx territory and not 537.xx it allows access to something can't remember at the moment.

device not found definitely happened with the last 55x i tried when running a nbody sample so today's a good day.

@AlexTo
Copy link

AlexTo commented Jul 21, 2024

@kziemski interesting, so far, all versions (538, 54x, 555, 559) work for me but I'm on a RTX/Quadro not Geforce series.

image

@kziemski
Copy link

@kziemski interesting, so far, all versions (538, 54x, 555, 559) work for me but I'm on a RTX/Quadro not Geforce series.

image

@AlexTo might be the difference. the way wslg the wsl libs and docker desktop all function i'm honestly suprised it works at all.

@Zhoneym
Copy link

Zhoneym commented Jul 28, 2024

4060446bf069c63d0da8d6bcf64085fa

@ahousley
Copy link

ahousley commented Aug 9, 2024

confirmed I still get the nvidia-smi Segmentation Fault in WSL2 on the latest driver:

  • Driver Version: 560.76
  • NVIDIA-SMI 560.28.03
  • CUDA Version: 12.6
  • WSL 2.2.4
  • Ubuntu 22.04.4 LTS
  • NVIDIA RTX 3000 Ada Laptop GPU

otherwise, the GPU workloads I tested on WSL2 are working (also via docker) and nvidia-smi works from PowerShell

@mjdiff
Copy link

mjdiff commented Aug 19, 2024

Still problem exists with 560 drivers

5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Mon Aug 19 15:38:44 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03              Driver Version: 560.76         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+======

@arturdaraujo
Copy link

Same problem here
image

@narbhar
Copy link

narbhar commented Aug 23, 2024

Can someone from Microsoft work with Nvidia to fix this issue? My work laptop auto updated the driver to 555.99 and this issue came back again. GPU is RTX A2000. Happens in Podman container as well as Ubuntu on WSL2
image

@SJoJoK
Copy link

SJoJoK commented Sep 10, 2024

same here

@Zhoneym
Copy link

Zhoneym commented Sep 14, 2024

cd640124db17753817186d30444249ab

@singidunumx
Copy link

singidunumx commented Sep 30, 2024

For me worked 538.15-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql.exe, I have A2000

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.01 Driver Version: 538.15 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A2000 Laptop GPU On | 00000000:01:00.0 Off | N/A |
| N/A 57C P0 8W / 38W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

@Zhoneym
Copy link

Zhoneym commented Oct 1, 2024

Windows version of NVIDIA GPU driver 565.90, CUDA level up to 12.7

The nvidia-smi segmentation fault issue on some GPUs in the WSL2 environment, which has persisted since version 538.15, has been fixed in 565.90.

This version also fixes the startup crash of Dying Light 2: Enhanced Edition.

0d26f41404f459594720eeb40efdc492

@bwirt
Copy link

bwirt commented Oct 5, 2024

The nvidia-smi segmentation fault issue on some GPUs in the WSL2 environment, which has persisted since version 538.15, has been fixed in 565.90.

Disagree, at least for me 565.90 still gives a segmentation fault on my system.

@z0ow
Copy link

z0ow commented Oct 7, 2024

Windows version of NVIDIA GPU driver 565.90, CUDA level up to 12.7

The nvidia-smi segmentation fault issue on some GPUs in the WSL2 environment, which has persisted since version 538.15, has been fixed in 565.90.

This version also fixes the startup crash of Dying Light 2: Enhanced Edition.

0d26f41404f459594720eeb40efdc492

It also fixes my problem, my platform is Windows 11 26120.1930, laptop GTX 1650Ti, and driver 565.90.

@netcaster1
Copy link

565.90 fixed my problem and looks it treat my L40s GPU as NPU after version up.

@vTuanpham
Copy link

Shut down the wsl with wsl --shutdown and rerun it work for me

@rdmtinez
Copy link

Apparently, this is a feature not a bug. NVIDIA-SMI support is limited... although there is not mention of Segmentation fault you might be seeing in the following docs: https://docs.nvidia.com/cuda/wsl-user-guide/index.html

@maxzaikin
Copy link

Apparently, this is a feature not a bug. NVIDIA-SMI support is limited... although there is not mention of Segmentation fault you might be seeing in the following docs: https://docs.nvidia.com/cuda/wsl-user-guide/index.html

Absolutely agree. I leave with it "feature" for already 6 months, with no negative side effects on my ML process.

Best regards,
Maks.

@maheshmrs
Copy link

maheshmrs commented Dec 13, 2024

You can check the GPU details with below command as well. This is working fine for me on WSL2.

nvidia-smi --list-gpus

o/p of command = GPU 0: NVIDIA GeForce RTX 2050 (UUID: GPU-2b9a833b-6fb9-b25e-1833-7ff832a835eb)

Also if you want to install cuda toolkit simply install pytorch. This package would install cuda toolkit automatically for GPU.

@DanielAtCosmicDNA
Copy link

In my case the problem seems to be associated with libnvidia-ml.so.1 file:

Image

@voycey
Copy link

voycey commented Dec 17, 2024

Download the latest drivers from Nvidia, the "Latest" one is only the latest stable version - there are betas / New Feature Branch builds out there that are newer but maybe not as stable / tested:

Image

This solved the problem for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests