Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully. #75

Open
ifromeast opened this issue May 13, 2024 · 2 comments

Comments

@ifromeast
Copy link

when running pytest test/python_fe on latest version, it returns

        graph.validate()
        graph.build_operation_graph()
        graph.create_execution_plans([cudnn.heur_mode.A, cudnn.heur_mode.FALLBACK])
>       graph.check_support()
E       cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.

test/python_fe/test_matmul_bias_relu.py:278: cudnnGraphNotSupportedError
==================================================================================== warnings summary =====================================================================================
test/python_fe/test_apply_rope.py::test_apply_rope
  /home/vipuser/miniconda3/envs/llm-env/lib/python3.10/site-packages/torch/random.py:159: UserWarning: CUDA reports that you have 8 available devices, and you have used fork_rng without explicitly specifying which devices are being used. For safety, we initialize *every* CUDA device by default, which can be quite slow if you have a lot of CUDAs. If you know that you are only making use of a few CUDA devices, set the environment variable CUDA_VISIBLE_DEVICES or the 'devices' keyword argument of fork_rng with the set of devices you are actually using. For example, if you are using CPU only, set device.upper()_VISIBLE_DEVICES= or devices=[]; if you are using device 0 only, set CUDA_VISIBLE_DEVICES=0 or devices=[0].  To initialize all devices and suppress this warning, set the 'devices' keyword argument to `range(torch.cuda.device_count())`.
    warnings.warn(message)

test/python_fe/test_conv_genstats.py::test_conv_genstats
  /mnt/zzd/llm.c/cudnn-frontend/test/python_fe/test_conv_genstats.py:14: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
    conv_output = torch.nn.functional.conv2d(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================= short test summary info =================================================================================
FAILED test/python_fe/test_apply_rope.py::test_apply_rope - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_batchnorm.py::test_bn_relu_with_mask - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_batchnorm.py::test_drelu_dadd_dbn - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_bias.py::test_conv_bias_relu - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_bias.py::test_conv_relu - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_bias.py::test_conv3d_bias_leaky_relu - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_bias.py::test_leaky_relu_backward - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_bias.py::test_conv_int8 - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_genstats.py::test_conv_genstats - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_conv_reduction.py::test_reduction - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_matmul_bias_relu.py::test_matmul_bias_relu[param_extract0] - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_matmul_bias_relu.py::test_matmul_bias_relu[param_extract1] - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_matmul_bias_relu.py::test_matmul_bias_relu[param_extract4] - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
FAILED test/python_fe/test_matmul_bias_relu.py::test_matmul_bias_relu[param_extract5] - cudnn._compiled_module.cudnnGraphNotSupportedError: [cudnn_frontend] Error: No execution plans built successfully.
================================================================ 14 failed, 3514 skipped, 2 warnings in 100.57s (0:01:40) =================================================================

and my CUDA is 12.4, cuDNN is 9.1, Driver Version is 550.54.15 on Ubuntu 22.04

@Anerudhan
Copy link
Collaborator

Thanks @ifromeast for following up on this from the llm.c repo.

I have added an experimental branch issues/75_and_78 to print the cudaGetLastError().

Please run,
CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=backend_api.log CUDNN_FRONTEND_LOG_FILE=fe.log CUDNN_FRONTEND_LOG_INFO=1 pytest -s test/python_fe and please attach both backend_api.log and fe.log for us to help debug.

Thanks

@YixuanSeanZhou
Copy link

@Anerudhan I am seeing this error as well when trying to run the matmul example here. I am running using my own script rather than within the repo.

Here are the log files you requested

backend_api.log
fe.log

Could you please take a look. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants