Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for additional local memory requirements in Global dispatch #127

Merged
merged 9 commits into from
Jan 5, 2024

Conversation

hjabird
Copy link
Contributor

@hjabird hjabird commented Jan 3, 2024

  • Not accounting for all local memory requirements in dispatch causes errors on A100.
  • This PR accounts for the additional required memory.
  • And also adds a regression test.
  • This is not an ideal fix for the issue - the memory requirements depend on the ordering of the sub-impls. This information is not available at dispatch time. A more significant refactor is required.

Checklist

Tick if relevant:

  • [n/a] New files have a copyright
  • [n/a] New headers have an include guards
  • [n/a] API is documented with Doxygen
  • New functionalities are tested
  • Tests pass locally
  • Files are clang-formatted

src/portfft/descriptor.hpp Outdated Show resolved Hide resolved
src/portfft/descriptor.hpp Outdated Show resolved Hide resolved
@hjabird hjabird changed the title Account for additional local memory requirements in Global dispatch Draft: Account for additional local memory requirements in Global dispatch Jan 3, 2024
@hjabird hjabird marked this pull request as draft January 3, 2024 11:26
* Not accounting for all local memory requirements in dispatch causes
errors on A100.
* This PR accounts for the additional required memory.
* And also adds a regression test.
* This is not an ideal fix for the issue - the memory requirements
depend on the ordering of the sub-impls. This information is not
available at dispatch time. A more significant refactor is required.
@hjabird hjabird force-pushed the hjab/fix_nvidia_regression_15360 branch from aec8049 to 8ecffbc Compare January 3, 2024 16:41
@hjabird hjabird changed the title Draft: Account for additional local memory requirements in Global dispatch Account for additional local memory requirements in Global dispatch Jan 4, 2024
@hjabird hjabird marked this pull request as ready for review January 4, 2024 10:19
Rbiessy
Rbiessy previously approved these changes Jan 4, 2024
src/portfft/descriptor.hpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AD2605 is it reasonable to summarise your changes as adjusting the global implementation to adjust the number of sub-groups used in a kernel to that set by num_scalars_in_local_mem instead of using PORTFFT_SGS_IN_WG?

src/portfft/common/global.hpp Show resolved Hide resolved
src/portfft/dispatcher/global_dispatcher.hpp Show resolved Hide resolved
src/portfft/descriptor.hpp Outdated Show resolved Hide resolved
@AD2605
Copy link
Contributor

AD2605 commented Jan 5, 2024

@AD2605 is it reasonable to summarise your changes as adjusting the global implementation to adjust the number of sub-groups used in a kernel to that set by num_scalars_in_local_mem instead of using PORTFFT_SGS_IN_WG?

Yes, that would be correct and was the cause of the error

Copy link
Contributor Author

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commit author, the Github interface doesn't let me approve what has become Atharva's PR. But LGTM.

@AD2605
Copy link
Contributor

AD2605 commented Jan 5, 2024

Given that I have an approval from Hugh, Approving this from my side to hit the 2 approval criteria and going ahead wuth the merge.

@AD2605 AD2605 merged commit e429002 into main Jan 5, 2024
1 check passed
@AD2605 AD2605 deleted the hjab/fix_nvidia_regression_15360 branch January 5, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants