-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: CUDA error: an illegal memory access was encountered when running benchmark_e2e.py #86
Comments
Addition: A different error reported on another triton version 2.1.0:
minference-0.1.5.post1 |
Hi @lepangdan, thanks for your feedback. This issue seems to be caused by insufficient hardware resources. Could you please provide details about the type of GPU you are using and the size of your CPU memory? Additionally, if you need to test with 1M tokens (or any inputs exceeding 200K) within an 80GB GPU memory, you need to enable the And are you able to run inference for 100K or 500K tokens without issues? Please try the following scripts: # For 1M tokens
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 1_000_000 --kv_cache_cpu
# For 100K tokens
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 100_000
# For 500K tokens
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 500_000 --kv_cache_cpu Let us know the results so we can further diagnose the issue. |
Hi @iofu728 My hardware configurations: 1* A100 | 80G gpu memory| 11 core | 100G cpu memory During the debugging process, the package versions have changed and current version information: Test results you mentioned: python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 100_000
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 500_000 --kv_cache_cpu
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 1_000_000 --kv_cache_cpu
Additional information:
I wonder if the version of packages matters. I am not certain, do you have any information on this? Loooooook forward to your reply. Thanks in advance. |
@iofu728 Add more test results for your information: hf:
minference:
|
Thank you for the information. However, it's a bit strange that "minference" runs normally while "minference_with_dense" doesn't. Please check whether your local Additionally, |
It works after installing I noticed that the key point is in the
The mentioned error Additionally, it might be helpful to add a warning when the |
Describe the bug
Hi,
I am running into an issue when executing
python experiments/benchmarks/benchmark_e2e.py --attn_type minference_with_dense --context_window 1_000_000
The error is:
Then, I tried setting CUDA_LAUNCH_BLOCKING=1, and I got the error info
RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Would you be able to provide any guidance on the possible causes of this error, or suggest debugging steps? Thanks in advance!
Steps to reproduce
No response
Expected Behavior
No response
Logs
No response
Additional Information
triton version: 2.2.0
torch version: 2.1.1+cu121
CUDA version: 12.2
The text was updated successfully, but these errors were encountered: