You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Developers,
When I execute the attn using the GPU 4090 with default parameters, I encounter the issue "too many resources requested for launch."
I have discovered that the "Registers per block" for the 4090 machine is 65,536, and each thread uses 162 registers. This results in a BLOCK_SIZE that cannot exceed 65,536 / 162 ≈ 404.
Have any of you faced this issue, and do you have any solutions?
The text was updated successfully, but these errors were encountered:
kexve
changed the title
[4090 attn] cudaCheckError(): too many resources requested for launch
[bug report][4090 attn] cudaCheckError(): too many resources requested for launch
Jun 4, 2024
I ran into this as well. Thanks for writing this up, I would have had no idea how to debug this otherwise. I notice that harness.impl defines BLOCK_SIZE as 32 * NUM_WORKERS, which is defined in 4090_ker.cu as 16. I was able to compile and run the kernel after setting NUM_WORKERS to 8. I don't know whether floor(404/32) = 12 would be a better value, I figured I'd leave it as a power of 2 just in case that was important.
I also just ran this without a data file, just using whatever new gave me for k/q/v/o_ref, so maybe it would fail with real data?
Dear Developers,
When I execute the attn using the GPU 4090 with default parameters, I encounter the issue "too many resources requested for launch."
I have discovered that the "Registers per block" for the 4090 machine is 65,536, and each thread uses 162 registers. This results in a BLOCK_SIZE that cannot exceed 65,536 / 162 ≈ 404.
Have any of you faced this issue, and do you have any solutions?
The text was updated successfully, but these errors were encountered: