-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. #9704
Comments
Could you try the 2.4.0 stable release and see if the problem persists? |
If you're running in a notebook, make sure to restart it and please do a clean reinstall of v0.30.3. Auraflow was released in v0.30.0, so this should not lead to any errors. Just to be sure that there are no longer any environment errors, could you paste the output of |
Hello, the problem is now solved, thank you for your time and consideration. Here are the version that worked for me |
I am facing the same error on
I'm on an H100, I'm guessing this has to do with the new cuDNN SDPA backend introduced in PyTorch 2.5 |
Yes, this seems like a problem with torch 2.5.0, and I've been able to reproduce this now as well. We'll need to take a look into how best to fix this (either on our end or we could talk with the pytorch folks) cc @sayakpaul @DN6 @yiyixuxu. Re-opening the issue for now |
As a work around you can disable the cudnn backend via https://pytorch.org/docs/stable/backends.html#torch.backends.cuda.enable_cudnn_sdp Would you mind opening an issue on PyTorch with a smallish repro, I can then forward to the Nvidia folks |
yes, imo, this should be reported to |
Can you link the frontend issue |
Seems to be NVIDIA/cudnn-frontend#75 and NVIDIA/cudnn-frontend#78 |
Having this issue as well but only on linux. no problems with cuda on windows. |
performance deg is from 6its to 2.5its using sdxl and having everything the same expect that one param. links to issues are already posted below. |
The cuDNN issues linked are generic across any unsupported config and may not correspond to this particular issue. Would it be possible to link a shorter repro as I'm currently trying to clone |
here's the shortest reproduction, like i said its when transformers uses sdp to process clip: import torch
from transformers import CLIPTextModel, AutoTokenizer
device = torch.device('cuda')
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32")
encoder = CLIPTextModel.from_pretrained("openai/clip-vit-base-patch32", cache_dir='/mnt/models/huggingface').to(device=device, dtype=torch.float16)
inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
print(inputs)
outputs = encoder(**inputs)
print(outputs)
btw, i just noticed that there is no issue when using torch.float32. but nobody uses torch.float32 anymore. |
Thanks, and it not happening with |
@vladmandic I am not seeing the same error locally with cuDNN 9.3. Which GPU are you on? I will try 9.1.7 in the meantime |
print(f'torch={torch.__version__} cuda={torch.version.cuda} cuDNN={torch.backends.cudnn.version()} device={torch.cuda.get_device_name(0)} cap={torch.cuda.get_device_capability(0)}')
note that cuda and cudnn are ones that come with torch. if torch 2.5 requires newer cudnn, it should handle its installation. |
Yes, 9.1.0.70 is what comes with cuDNN and I didn't see the failure on L40, L4, or RTX 6000 Ada which are also sm89 (it is able to generate and run a kernel). I'm thinking that maybe the issue is the CUDA version, will also try that later. |
Even a clean environment didn't help me. I had to install torch=2.4.0 to get rid of the issue. |
Hmm. How much speedup does one get when using CLIP in SDPA? I remember when we incorporated SDPA in CLIP the speedup wasn't that significant. We could verify this by instantiating the CLIP with: text_encoder = CLIPTextModel.from_pretrained(..., attn_implementation="eager", ...)
pipeline = DiffusionPipeline.from_prertrained(..., text_encoder=text_encoder) Cc: @ArthurZucker |
i tried using |
You mean changing the CLIP (and potentially other models from I guess we have a couple of ways but I think we could pass this info to
Something like (pseudo-code): if is_transformers_model:
if is_transformers_version(...):
if is_torch_version(">=", "2.5"):
loading_kwargs.update({"attn_implementation": "eager"}) @DN6 WDYT? Or maybe @ArthurZucker from |
@vladmandic does your output look similar to this?
|
[SDPA-CUDNN] Make CuDNN Attention Opt in (#138522) # Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet (cherry picked from commit 9a9a0ab) Co-authored-by: drisspg <[email protected]>
@vladmandic we have an RC available, would you mind trying w/ this version of PyTorch:
|
tried both 2.5.1-rc and 2.6.0-nightly and both look fine - thanks!
but...this is basically moving cudnn backend from highest priority to lowest priority so it behaves the same as previous versions of torch and there is still underlying issue with cudnn... yup - confirmed with: torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(False) this makes sdpa pick cudnn backend and issue is back. |
@vladmandic That is expected. The goal of this is to fix the default behavior. We are currently making CuDNN always the lowest priority so that math will be picked before cuDNN. You can of course manually call CuDNN and you will hit the existing error. @eqy Is working on the actual fix to the CuDNN backend so that we can in the future increase its priority. |
yup, makes total sense, just wanted to confirm - thanks. |
@vladmandic in the meantime we tried to repro the issue on 4090 but were unable to see it on our end (both cuDNN team and my own local testing). Could you share some more details about your environment? Is it e.g., Windows? In the meantime I'm working on cuDNN robustness on sm8x and have found similar issues but it would be good if we could guarantee your specific use-case was covered. |
ahhh, that made me wonder... running on host:
and...no issues! but running in a VM
so its something about virtualization. |
ran into the same issue,
as suggested fixed it for me for now. |
thanks @felixniemeyer :) |
@vladmandic thanks for leading the charge here! Also, thanks to @eqy @drisspg for the help! |
sure! |
yes it happens on Linux systems w/o any para/hw virt. |
Could you post the repro(s) that you are running if they are different? @bghira @felixniemeyer (The same error message can be triggered with different root causes, e.g., compilation failure due to environment differences vs. compilation failure due to incorrect code generation) |
i wasnt able to identify any cause as we had cuda 12.4 images working okay but no clear link between library versions and this error. |
Repro means way to reproduce the error message, right? I was sticking to this guide to train a stable diffusion 1.5 LoRA: This was the command I was executing after setting up according to the guide:
I have created a venv at diffusers root level and installed the Can you reproduce it like this? |
# Summary Currently we have a `cudnn_order` that says on H100 w/ new enough CuDNN backend (we ship a 9.1 version in OSS) try to run CuDNN attention first. We have already encountered a few bugs with the release of 2.5: 1. #138529 2. huggingface/diffusers#9704 3. #138354 In light of the above we are going to make the CuDNN backend Opt-in by default. This can be done easily with the context manager for choosing backends I.e.: ``` Python from torch.nn.attention import sdpa_kernel, SDPBackend with sdpa_kernel(SDPBackend.CUDNN_ATTENTION): out = F.scaled_dot_product_attention(q, k, v) ``` This PR puts the CuDNN backend as the lowest precedence in the backend list, meaning that the Math backend will always be chosen unless disabled (which is done via the context manager). Cc @atalman Pull Request resolved: #138522 Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/malfet
@vladmandic just checking in here. Does this issue go away with PyTorch 2.5.1? |
yes |
Thanks a lot for the minimal reproduction but I think the issue above is better off in the PyTorch repo no? |
Describe the bug
Hello. I tried the Img2Img Pipeline and encountered the error in the images. Could you please check it for me? Thank you
Reproduction
Logs
No response
System Info
diffusers 0.30.3
Python 3.9.20
Who can help?
No response
The text was updated successfully, but these errors were encountered: