WSL2 CUDA Does Not Respect `CUDA - Sysmem Fallback Policy` #11050

chengzeyi · 2024-01-20T03:09:19Z

Windows Version

Microsoft Windows [Version 10.0.22635.3061]

WSL Version

2.1.0.0

Are you using WSL 1 or WSL 2?

WSL 2
WSL 1

Kernel Version

5.15.137.3-1

Distro Version

Ubuntu 22.04

Other Software

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0  On |                  Off |
|  0%   38C    P8              16W / 450W |   1528MiB / 24564MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        38      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

torch                     2.3.0.dev20231227+cu121

Repro Steps

With a recent version of NVIDIA GPU Driver, in Windows NVIDIA Driver Settings, disable CUDA - Sysmem Fallback Policy globally and restart the computer.

In WSL terminal, execute the following command to allocate a tensor that should exceed the overall GPU memory available (30 GiB required VS 24 GiB available).

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Instead of throwing an OOM exception, the command executes successfully.

However, the expected behaviour is to throw an OOM exception so that some applications like StableDiffusion WebUI can detect that the inadequacy of the available GPU memory and choose a memory efficient way to do inference. The current behaviour just uses the slow fallbacked CPU memory, which causes the inference to be really slow.

The total available GPU memory showed by torch also looks weired.

python3 -c 'import torch; print(torch.cuda.get_device_properties(0).total_memory)'

Expected Behavior

Executing the following command throws an OOM exception

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Actual Behavior

The command executes peacefully, which is not expected.

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Diagnostic Logs

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-20T03:09:46Z

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

CUDA Initialization Issue and Proposed Workaround with PyTorch in wsl2 (#10269), similarity score: 0.78
No CUDA Runtime is Found on WSL2 (#9092), similarity score: 0.78

Closed similar issues:

CUDA not working on WSL2 (#7336), similarity score: 0.84
WSL2 & CUDA does not work [v20226] (#6014), similarity score: 0.83
CUDA error in WSL (#6622), similarity score: 0.80

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

elsaco · 2024-01-21T18:05:57Z

@chengzeyi try running your command multiple times. In my case the OOM triggered on the 2nd run:

(phi-2-env) root@texas:/mnt/e/ai/phi-2# python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)
cuda:0
>>> x.device
device(type='cuda', index=0)
>>> x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 GiB. GPU 0 has a total capacty of 15.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 30.00 GiB is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

nvidia-smi output:

Sun Jan 21 10:04:33 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               On  | 00000000:61:00.0  On |                  Off |
| 41%   33C    P8              11W / 140W |    515MiB / 16376MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

chengzeyi · 2024-01-22T13:28:19Z

@elsaco Yeah, that's the bug, the first run should trigger an OOM instead of the following runs. In your case 30GB x 2 is too large even for fallbacked sysmem, so it fails in the second run.

Pipyakas · 2024-03-18T07:49:44Z

this is what I'm using to get around this issue

import torch
torch.cuda.set_per_process_memory_fraction(1.0, 0)

memory allocation now behave likes bare metal Linux, but on WSL2.
of course, this only works for pytorch, but I rather have it than not at all.
I hope the team work on this soon.

strawberrymelonpanda · 2024-05-13T10:03:56Z

Just ran into this issue today and was briefly confused why my processing speed dropped like a stone when it looked like things were fitting into VRAM and I knew I had sysmem fallback off.

Thanks for the workaround, but I too hope to see action on this.

chengzeyi changed the title ~~WSL2 CUDA Does Not Respot CUDA - Sysmem Fallback Policy~~ WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy Jan 20, 2024

OneBlue added the GPU label Jan 23, 2024

erew123 mentioned this issue May 1, 2024

CUDA out of memory. (RTX 4070 super) erew123/alltalk_tts#195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSL2 CUDA Does Not Respect `CUDA - Sysmem Fallback Policy` #11050

WSL2 CUDA Does Not Respect `CUDA - Sysmem Fallback Policy` #11050

chengzeyi commented Jan 20, 2024 •

edited

Loading

github-actions bot commented Jan 20, 2024

elsaco commented Jan 21, 2024

chengzeyi commented Jan 22, 2024

Pipyakas commented Mar 18, 2024

strawberrymelonpanda commented May 13, 2024

WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy #11050

WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy #11050

Comments

chengzeyi commented Jan 20, 2024 • edited Loading

Windows Version

WSL Version

Are you using WSL 1 or WSL 2?

Kernel Version

Distro Version

Other Software

Repro Steps

Expected Behavior

Actual Behavior

Diagnostic Logs

github-actions bot commented Jan 20, 2024

Open similar issues:

Closed similar issues:

elsaco commented Jan 21, 2024

chengzeyi commented Jan 22, 2024

Pipyakas commented Mar 18, 2024

strawberrymelonpanda commented May 13, 2024

WSL2 CUDA Does Not Respect `CUDA - Sysmem Fallback Policy` #11050

WSL2 CUDA Does Not Respect `CUDA - Sysmem Fallback Policy` #11050

chengzeyi commented Jan 20, 2024 •

edited

Loading