-
Notifications
You must be signed in to change notification settings - Fork 832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-smi segmentation fault in wsl2 but not in Windows #11277
Comments
Hi I'm an AI powered bot that finds similar issues based off the issue title. Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you! Open similar issues:
Closed similar issues:
|
Interestingly,
|
I am also seeing this issue. Interestingly- the nvidia-smi version is different on mine: NVIDIA-SMI 545.29.06 Driver Version: 551.61 CUDA Version: 12.4 and glxinfo returns the same output as the poster above. |
I have the exact same issue. Environment
Distro is Ubuntu-22.04. nvidia-smi in windows gives this:
But inside Ubuntu under WSL2:
Everything was working fine last week. I didn't install anything new in Ubuntu, I only updated nvidia drivers in Windows and I highly suspect that's the problem. I sadly don't remember which version of the drivers I had before since I hadn't updated for quite some time (not that I know how to downgrade drivers to test my theory). This is highly blocking, I need CUDA for my daily work. |
Also, lspci returns this:
I dunno if this is normal or if the two NVidia cards should be reported as actually NVidia. |
Just to update- I thought this segmentation fault item was causing an issue, but I am using a Tesla P4 and data center drivers. I was unable to use pytorch or anything without rebooting into safe mode, running DDU and clean installing my drivers. I still had issues after reinstalling drivers, so I went into WSL and removed all nvidia and cuda packages, rebooted/DDU/clean reinstalled one more time, and now I can use Cuda like normal. I still see regular nvidia-smi output in Powershell but segmentation fault in WSL- but I can still run all my applications. Just in case someone misidentifies this as a different issue they are having, like I did. |
I use CUDA inside docker images launched within wsl2's Ubuntu and the graphics card are not found while it worked before so the issue is clearly not limited to nvidia-smi in my personal case. Just to be extra-precise too. |
@themizzi if
and with
Looks like
|
I'm running on exactly the same environment and experiencing the same problems as @themizzi . |
@themizzi which Output using Game Ready Driver ver. 551.67:
Notice the nvidia-smi version, it's |
same issue but even less information in WSL2, nothing printed out rather than Segmentation fault, not even heads including version |
Same problem here..
|
I am seeing the same behavior. Do we know if this is a problem with a specific driver? |
Update: Did nothing, reboot my computer, without trying nvidia-smi in windows, I directly tried it in WSL, worked with no error. |
Don't work for me @Rui-K |
I still see the segfault. However, if I run nvidia-smi.exe from within WSL it displays correctly. Additionally, if I try running programs that use CUDA they do run. |
@jaubourg you're just launching a Windows executable from within WSL, which is a
nvidia-smi utility is not using any of the CUDA libraries. What is the output of This is the output on my system:
and both Windows and Linux nvidia-smi work. |
Had the same issue and it kept me up for very long... the thing that fixed it for me was uninstalling the Nvidia driver (which I updated to 551.76 ) and installing an older one (NOT 551.61 which also didn't work), 537.58 from October 2023 in my case but it was pretty random choice. |
I think it is the issue with NVIDIA 551 driver. It works for me with the previous NVIDIA 537 but after upgrading, I got segmentation fault in WSL2 as well. |
same, downgrade to 537 solve my problem. |
I uninstalled the NVIDIA driver and installed v537.58 as advised in the last few days, and the Segmentation fault on WSL2 disappeared. Thanks for the replies guys! |
same, downgrade to 537 solve my problem. |
Tried a few different versions and it seems like everything 538+ is broken.
|
Hi, Tue Apr 9 20:40:24 2024 |
@eyabesbes you have to uninstall the current Nvidia Windows driver then install version 537. Nvidia WSL libraries are part of the Windows installer. Nvidia studio drivers seems to work okay:
|
good news: find solution. |
Having exact same issue here: +-----------------------------------------------------------------------------------------+ nvidia-smi(Ubuntu WSL2) I have Win-11 Pro with the latest updates. Does anybody figured-out how to fix this issue? I think it is root cause of that my tensorflow doesn't see GPU... sad |
So far here is my solution: |
Are there any WSL people even in this group, I'd love to be able to update my gpu drivers at some point? |
In my experience, it's a hit or miss issue depending on the nvidia driver version. With
However, there were times when nvidia-smi would segfault. It only takes one update for it to fail again! |
@kziemski I've been using WSL2 with 54x and 55x versions. I can run Pytorch with CUDA, NVidia Container Toolkit etc.. inside my WSL2 Ubuntu without any issues. My code can utilize CUDA normally. I think if you don't care about |
I think this is bit confusing because i thought this issue was tied to and a common issue with gpu device not being found within wsl2 and docker via wsl2 the last version that worked was 537.58 afterwards running nbody for instance causes nvidia device not found. I've been waiting for a version past 537.58 that will work in ubuntu 22.04. |
@kziemski I thought so too as the first thing I did after installing WSL2 was to |
@AlexTo , alex as of some version 55x.xx it still wasn't working but as of 560.70 it does work. device not found definitely happened with the last 55x i tried when running a nbody sample so today's a good day. |
@kziemski interesting, so far, all versions (538, 54x, 555, 559) work for me but I'm on a RTX/Quadro not Geforce series. |
confirmed I still get the
otherwise, the GPU workloads I tested on WSL2 are working (also via docker) and |
Still problem exists with 560 drivers
|
same here |
For me worked 538.15-quadro-rtx-desktop-notebook-win10-win11-64bit-international-dch-whql.exe, I have A2000 +---------------------------------------------------------------------------------------+ +---------------------------------------------------------------------------------------+ |
Disagree, at least for me 565.90 still gives a segmentation fault on my system. |
It also fixes my problem, my platform is Windows 11 26120.1930, laptop GTX 1650Ti, and driver 565.90. |
565.90 fixed my problem and looks it treat my L40s GPU as NPU after version up. |
Shut down the wsl with |
Apparently, this is a feature not a bug. NVIDIA-SMI support is limited... although there is not mention of Segmentation fault you might be seeing in the following docs: https://docs.nvidia.com/cuda/wsl-user-guide/index.html |
Absolutely agree. I leave with it "feature" for already 6 months, with no negative side effects on my ML process. Best regards, |
You can check the GPU details with below command as well. This is working fine for me on WSL2.
o/p of command = GPU 0: NVIDIA GeForce RTX 2050 (UUID: GPU-2b9a833b-6fb9-b25e-1833-7ff832a835eb) Also if you want to install cuda toolkit simply install pytorch. This package would install cuda toolkit automatically for GPU. |
Windows Version
10.0.22631.3235
WSL Version
2.1.4.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.146.1-2
Distro Version
Ubuntu 22.04
Other Software
GeForce GTX 1650 Ti with GeForce Game Ready Driver version 551.76
Repro Steps
Run
nvidia-smi
in Windows and get the following:Run
nvidia-smi
in wsl2 Ubuntu and get the following:Expected Behavior
I am expecting no segmentation fault and successful output in WSL 2.
Actual Behavior
I get a segmentation fault in WSL2 as described above.
Diagnostic Logs
No response
The text was updated successfully, but these errors were encountered: