Account for additional local memory requirements in Global dispatch #127

hjabird · 2024-01-03T10:09:26Z

Not accounting for all local memory requirements in dispatch causes errors on A100.
This PR accounts for the additional required memory.
And also adds a regression test.
This is not an ideal fix for the issue - the memory requirements depend on the ordering of the sub-impls. This information is not available at dispatch time. A more significant refactor is required.

Checklist

Tick if relevant:

[n/a] New files have a copyright
[n/a] New headers have an include guards
[n/a] API is documented with Doxygen
New functionalities are tested
Tests pass locally
Files are clang-formatted

src/portfft/descriptor.hpp

test/unit_test/instantiate_fft_tests.hpp

* Not accounting for all local memory requirements in dispatch causes errors on A100. * This PR accounts for the additional required memory. * And also adds a regression test. * This is not an ideal fix for the issue - the memory requirements depend on the ordering of the sub-impls. This information is not available at dispatch time. A more significant refactor is required.

src/portfft/descriptor.hpp

hjabird

@AD2605 is it reasonable to summarise your changes as adjusting the global implementation to adjust the number of sub-groups used in a kernel to that set by num_scalars_in_local_mem instead of using PORTFFT_SGS_IN_WG?

src/portfft/common/global.hpp

src/portfft/dispatcher/global_dispatcher.hpp

src/portfft/descriptor.hpp

AD2605 · 2024-01-05T13:28:44Z

@AD2605 is it reasonable to summarise your changes as adjusting the global implementation to adjust the number of sub-groups used in a kernel to that set by num_scalars_in_local_mem instead of using PORTFFT_SGS_IN_WG?

Yes, that would be correct and was the cause of the error

hjabird

As commit author, the Github interface doesn't let me approve what has become Atharva's PR. But LGTM.

AD2605 · 2024-01-05T17:34:14Z

Given that I have an approval from Hugh, Approving this from my side to hit the 2 approval criteria and going ahead wuth the merge.

AD2605 reviewed Jan 3, 2024

View reviewed changes

src/portfft/descriptor.hpp Outdated Show resolved Hide resolved

Rbiessy reviewed Jan 3, 2024

View reviewed changes

src/portfft/descriptor.hpp Outdated Show resolved Hide resolved

AD2605 reviewed Jan 3, 2024

View reviewed changes

test/unit_test/instantiate_fft_tests.hpp Outdated Show resolved Hide resolved

hjabird changed the title ~~Account for additional local memory requirements in Global dispatch~~ Draft: Account for additional local memory requirements in Global dispatch Jan 3, 2024

hjabird marked this pull request as draft January 3, 2024 11:26

AD2605 mentioned this pull request Jan 3, 2024

Allocating correct amount of Local memory in Global Implementation #128

Merged

2 tasks

hjabird added 2 commits January 3, 2024 15:22

Adjust Global/Subgroup required mem calc

8ecffbc

hjabird force-pushed the hjab/fix_nvidia_regression_15360 branch from aec8049 to 8ecffbc Compare January 3, 2024 16:41

hjabird changed the title ~~Draft: Account for additional local memory requirements in Global dispatch~~ Account for additional local memory requirements in Global dispatch Jan 4, 2024

hjabird marked this pull request as ready for review January 4, 2024 10:19

Rbiessy previously approved these changes Jan 4, 2024

View reviewed changes

AD2605 reviewed Jan 4, 2024

View reviewed changes

src/portfft/descriptor.hpp Outdated Show resolved Hide resolved

AD2605 added 4 commits January 4, 2024 23:17

fix size 15360 on nvidia

6c24574

Merge branch 'atharva/15360_fix' into hjab/fix_nvidia_regression_15360

4df9c39

restore fits_in_local_memory_subgroup

4e9d3fb

remove duplicate regression test suite

2ee722e

AD2605 dismissed Rbiessy’s stale review via 2ee722e January 5, 2024 10:43

AD2605 added 2 commits January 5, 2024 10:57

remove local mem check while selecting WI implementation

b523a3c

remove unused LocalRange Variable

7041405

hjabird commented Jan 5, 2024

View reviewed changes

src/portfft/common/global.hpp Show resolved Hide resolved

src/portfft/dispatcher/global_dispatcher.hpp Show resolved Hide resolved

src/portfft/descriptor.hpp Outdated Show resolved Hide resolved

fix remaining warnings

296c3de

hjabird commented Jan 5, 2024

View reviewed changes

Rbiessy approved these changes Jan 5, 2024

View reviewed changes

AD2605 approved these changes Jan 5, 2024

View reviewed changes

AD2605 merged commit e429002 into main Jan 5, 2024
1 check passed

AD2605 deleted the hjab/fix_nvidia_regression_15360 branch January 5, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for additional local memory requirements in Global dispatch #127

Account for additional local memory requirements in Global dispatch #127

hjabird commented Jan 3, 2024

hjabird left a comment

AD2605 commented Jan 5, 2024

hjabird left a comment

AD2605 commented Jan 5, 2024

Account for additional local memory requirements in Global dispatch #127

Account for additional local memory requirements in Global dispatch #127

Conversation

hjabird commented Jan 3, 2024

Checklist

hjabird left a comment

Choose a reason for hiding this comment

AD2605 commented Jan 5, 2024

hjabird left a comment

Choose a reason for hiding this comment

AD2605 commented Jan 5, 2024