Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy #11050

Open
1 of 2 tasks
chengzeyi opened this issue Jan 20, 2024 · 5 comments
Open
1 of 2 tasks

WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy #11050

chengzeyi opened this issue Jan 20, 2024 · 5 comments
Labels

Comments

@chengzeyi
Copy link

chengzeyi commented Jan 20, 2024

Windows Version

Microsoft Windows [Version 10.0.22635.3061]

WSL Version

2.1.0.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.137.3-1

Distro Version

Ubuntu 22.04

Other Software

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0  On |                  Off |
|  0%   38C    P8              16W / 450W |   1528MiB / 24564MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        38      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+
torch                     2.3.0.dev20231227+cu121

Repro Steps

With a recent version of NVIDIA GPU Driver, in Windows NVIDIA Driver Settings, disable CUDA - Sysmem Fallback Policy globally and restart the computer.

image

In WSL terminal, execute the following command to allocate a tensor that should exceed the overall GPU memory available (30 GiB required VS 24 GiB available).

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Instead of throwing an OOM exception, the command executes successfully.

image

However, the expected behaviour is to throw an OOM exception so that some applications like StableDiffusion WebUI can detect that the inadequacy of the available GPU memory and choose a memory efficient way to do inference. The current behaviour just uses the slow fallbacked CPU memory, which causes the inference to be really slow.

The total available GPU memory showed by torch also looks weired.

python3 -c 'import torch; print(torch.cuda.get_device_properties(0).total_memory)'

image

Expected Behavior

Executing the following command throws an OOM exception

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Actual Behavior

The command executes peacefully, which is not expected.

python3 -c 'import torch; x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)'

Diagnostic Logs

No response

@chengzeyi chengzeyi changed the title WSL2 CUDA Does Not Respot CUDA - Sysmem Fallback Policy WSL2 CUDA Does Not Respect CUDA - Sysmem Fallback Policy Jan 20, 2024
Copy link

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@elsaco
Copy link

elsaco commented Jan 21, 2024

@chengzeyi try running your command multiple times. In my case the OOM triggered on the 2nd run:

(phi-2-env) root@texas:/mnt/e/ai/phi-2# python
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)
cuda:0
>>> x.device
device(type='cuda', index=0)
>>> x = torch.ones((30, 1024, 1024, 1024), dtype=torch.uint8, device="cuda"); print(x.device)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 GiB. GPU 0 has a total capacty of 15.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 30.00 GiB is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

nvidia-smi output:

Sun Jan 21 10:04:33 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               On  | 00000000:61:00.0  On |                  Off |
| 41%   33C    P8              11W / 140W |    515MiB / 16376MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

@chengzeyi
Copy link
Author

@elsaco Yeah, that's the bug, the first run should trigger an OOM instead of the following runs. In your case 30GB x 2 is too large even for fallbacked sysmem, so it fails in the second run.

@OneBlue OneBlue added the GPU label Jan 23, 2024
@Pipyakas
Copy link

this is what I'm using to get around this issue

import torch
torch.cuda.set_per_process_memory_fraction(1.0, 0)

memory allocation now behave likes bare metal Linux, but on WSL2.
of course, this only works for pytorch, but I rather have it than not at all.
I hope the team work on this soon.

@strawberrymelonpanda
Copy link

Just ran into this issue today and was briefly confused why my processing speed dropped like a stone when it looked like things were fitting into VRAM and I knew I had sysmem fallback off.

Thanks for the workaround, but I too hope to see action on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants