-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2.6.3's flash_attn_varlen_func runs faster than v2.7.0.post2's flash_Attn_varlen_func on H100 #1338
Comments
Please try compiling with CUDA 12.3 |
I believe my cuda version is 12.4.
Not sure if CUDA 12.4 was the issue. |
What matters is is the version of nvcc, not the CUDA driver. You can install cuda software toolkit (including nvcc) to whichever driver version |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found v2.6.3's
flash_attn_varlen_func
runs faster than v2.7.0.post2'sflash_Attn_varlen_func
on H100.code
Result from using v2.6.3 on H100:
Result from using v2.7.0.post2 on H100:
The runtime is 150ms vs 221ms.
The text was updated successfully, but these errors were encountered: