-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for additional local memory requirements in Global dispatch #127
Conversation
* Not accounting for all local memory requirements in dispatch causes errors on A100. * This PR accounts for the additional required memory. * And also adds a regression test. * This is not an ideal fix for the issue - the memory requirements depend on the ordering of the sub-impls. This information is not available at dispatch time. A more significant refactor is required.
aec8049
to
8ecffbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AD2605 is it reasonable to summarise your changes as adjusting the global implementation to adjust the number of sub-groups used in a kernel to that set by num_scalars_in_local_mem
instead of using PORTFFT_SGS_IN_WG
?
Yes, that would be correct and was the cause of the error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commit author, the Github interface doesn't let me approve what has become Atharva's PR. But LGTM.
Given that I have an approval from Hugh, Approving this from my side to hit the 2 approval criteria and going ahead wuth the merge. |
Checklist
Tick if relevant: