Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate predicates for cp.async.bulk normally #1903

Merged
merged 1 commit into from
Mar 13, 2024
Merged

Conversation

zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Mar 11, 2024

In our current main branch, all predicates of cp.async.bulk are skipped. It is skipped not because it should be like that, but instead, it is just a quick simple hack to allow us to incrementally build out TMA. Currently, TMA can only be used in a <<<1, 1>>> kernel, and it can only be used to copy the entire tensor, instead of copying a part of that tensor. Under this limitation, it totally makes sense to skip the predicates.

However, it no longer makes sense to skip predicate generation for TMA as we are adding support for non-trivial cases. For example, in #1484, an if (threadIdx.x == 0 && threadIdx.x == 0 && threadIdx.x == 0) is manually created in the double buffering pass as a temporary solution. Also, I just started working on allowing TMA to be used in a non-<<<1, 1>>> kernel, where a thread predicate is clearly needed.

In this PR, I am re-enabling predicate generation for TMA. For all the code that is already in main branch, this PR should be a no-op. I do not expect any change in the generated code for any TMA test. However, #1484 will be impacted in the sense that the if (threadIdx.x == 0 && threadIdx.x == 0 && threadIdx.x == 0) should no longer be created manually in the double-buffering pass, but instead, the double-buffering pass should leave the TMA op as-is, and the predicate generation pass will handle it.

@zasdfgbnm
Copy link
Collaborator Author

!build

@zasdfgbnm zasdfgbnm marked this pull request as ready for review March 12, 2024 00:26
@zasdfgbnm zasdfgbnm requested review from naoyam and drzejan2 March 12, 2024 00:26
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drzejan2
Copy link
Collaborator

Discussed offline, for #1484 , I will have this change manually reverted initially, to stabilize the changes made double buffering pass. Then I will remove it, so it will depend on the built-in analysis for predicates.

@zasdfgbnm zasdfgbnm merged commit 8c45661 into main Mar 13, 2024
34 of 35 checks passed
@zasdfgbnm zasdfgbnm deleted the zasdfgbnm-patch-5 branch March 13, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants