[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

kexve · 2024-06-04T03:39:16Z

Dear Developers,
When I execute the attn using the GPU 4090 with default parameters, I encounter the issue "too many resources requested for launch."

I have discovered that the "Registers per block" for the 4090 machine is 65,536, and each thread uses 162 registers. This results in a BLOCK_SIZE that cannot exceed 65,536 / 162 ≈ 404.

Have any of you faced this issue, and do you have any solutions?

ahepp · 2024-07-19T21:01:32Z

I ran into this as well. Thanks for writing this up, I would have had no idea how to debug this otherwise. I notice that harness.impl defines BLOCK_SIZE as 32 * NUM_WORKERS, which is defined in 4090_ker.cu as 16. I was able to compile and run the kernel after setting NUM_WORKERS to 8. I don't know whether floor(404/32) = 12 would be a better value, I figured I'd leave it as a power of 2 just in case that was important.

I also just ran this without a data file, just using whatever new gave me for k/q/v/o_ref, so maybe it would fail with real data?

kexve changed the title ~~[4090 attn] cudaCheckError(): too many resources requested for launch~~ [bug report][4090 attn] cudaCheckError(): too many resources requested for launch Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

kexve commented Jun 4, 2024

ahepp commented Jul 19, 2024

[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

Comments

kexve commented Jun 4, 2024

ahepp commented Jul 19, 2024