-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy
#11050
Comments
CUDA - Sysmem Fallback Policy
CUDA - Sysmem Fallback Policy
Hi I'm an AI powered bot that finds similar issues based off the issue title. Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you! Open similar issues:
Closed similar issues:
|
@chengzeyi try running your command multiple times. In my case the OOM triggered on the 2nd run:
nvidia-smi output:
|
@elsaco Yeah, that's the bug, the first run should trigger an OOM instead of the following runs. In your case 30GB x 2 is too large even for fallbacked sysmem, so it fails in the second run. |
this is what I'm using to get around this issue
memory allocation now behave likes bare metal Linux, but on WSL2. |
Just ran into this issue today and was briefly confused why my processing speed dropped like a stone when it looked like things were fitting into VRAM and I knew I had sysmem fallback off. Thanks for the workaround, but I too hope to see action on this. |
Windows Version
Microsoft Windows [Version 10.0.22635.3061]
WSL Version
2.1.0.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.137.3-1
Distro Version
Ubuntu 22.04
Other Software
Repro Steps
With a recent version of NVIDIA GPU Driver, in Windows NVIDIA Driver Settings, disable
CUDA - Sysmem Fallback Policy
globally and restart the computer.In WSL terminal, execute the following command to allocate a tensor that should exceed the overall GPU memory available (30 GiB required VS 24 GiB available).
python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'
Instead of throwing an OOM exception, the command executes successfully.
However, the expected behaviour is to throw an OOM exception so that some applications like
StableDiffusion WebUI
can detect that the inadequacy of the available GPU memory and choose a memory efficient way to do inference. The current behaviour just uses the slow fallbacked CPU memory, which causes the inference to be really slow.The total available GPU memory showed by torch also looks weired.
python3 -c 'import torch; print(torch.cuda.get_device_properties(0).total_memory)'
Expected Behavior
Executing the following command throws an OOM exception
python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'
Actual Behavior
The command executes peacefully, which is not expected.
python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'
Diagnostic Logs
No response
The text was updated successfully, but these errors were encountered: