Generate predicates for `cp.async.bulk` normally #1903

zasdfgbnm · 2024-03-11T22:55:07Z

In our current main branch, all predicates of cp.async.bulk are skipped. It is skipped not because it should be like that, but instead, it is just a quick simple hack to allow us to incrementally build out TMA. Currently, TMA can only be used in a <<<1, 1>>> kernel, and it can only be used to copy the entire tensor, instead of copying a part of that tensor. Under this limitation, it totally makes sense to skip the predicates.

However, it no longer makes sense to skip predicate generation for TMA as we are adding support for non-trivial cases. For example, in #1484, an if (threadIdx.x == 0 && threadIdx.x == 0 && threadIdx.x == 0) is manually created in the double buffering pass as a temporary solution. Also, I just started working on allowing TMA to be used in a non-<<<1, 1>>> kernel, where a thread predicate is clearly needed.

In this PR, I am re-enabling predicate generation for TMA. For all the code that is already in main branch, this PR should be a no-op. I do not expect any change in the generated code for any TMA test. However, #1484 will be impacted in the sense that the if (threadIdx.x == 0 && threadIdx.x == 0 && threadIdx.x == 0) should no longer be created manually in the double-buffering pass, but instead, the double-buffering pass should leave the TMA op as-is, and the predicate generation pass will handle it.

zasdfgbnm · 2024-03-11T22:55:23Z

!build

naoyam

LGTM

drzejan2 · 2024-03-13T14:03:41Z

Discussed offline, for #1484 , I will have this change manually reverted initially, to stabilize the changes made double buffering pass. Then I will remove it, so it will depend on the built-in analysis for predicates.

Generate predicates for cp.async.bulk normally

f704b57

zasdfgbnm marked this pull request as ready for review March 12, 2024 00:26

zasdfgbnm requested review from naoyam and drzejan2 March 12, 2024 00:26

naoyam approved these changes Mar 12, 2024

View reviewed changes

drzejan2 approved these changes Mar 13, 2024

View reviewed changes

zasdfgbnm merged commit 8c45661 into main Mar 13, 2024
34 of 35 checks passed

zasdfgbnm deleted the zasdfgbnm-patch-5 branch March 13, 2024 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate predicates for `cp.async.bulk` normally #1903

Generate predicates for `cp.async.bulk` normally #1903

zasdfgbnm commented Mar 11, 2024 •

edited

Loading

zasdfgbnm commented Mar 11, 2024

naoyam left a comment

drzejan2 commented Mar 13, 2024

Generate predicates for cp.async.bulk normally #1903

Generate predicates for cp.async.bulk normally #1903

Conversation

zasdfgbnm commented Mar 11, 2024 • edited Loading

zasdfgbnm commented Mar 11, 2024

naoyam left a comment

Choose a reason for hiding this comment

drzejan2 commented Mar 13, 2024

Generate predicates for `cp.async.bulk` normally #1903

Generate predicates for `cp.async.bulk` normally #1903

zasdfgbnm commented Mar 11, 2024 •

edited

Loading