Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different cp.async and load stages for operands A and B #2211

Open
jacobhinkle opened this issue May 7, 2024 · 2 comments
Open

Support different cp.async and load stages for operands A and B #2211

jacobhinkle opened this issue May 7, 2024 · 2 comments
Assignees
Labels

Comments

@jacobhinkle
Copy link
Collaborator

As suggested in this comment: #2105 (comment)

Async loads can only be used for vectorization 4, 8, or 16. If either operand only support vectorization size of 1 or 2 elements, then we disable async loads altogether (in #2105) and limit the double buffer load stages for both operands to 2. This means we do synchronous double buffering of both operands even if one of them is fully vectorizable.

Instead, we should enable different load stages and cp.async settings for each operand.

@jacobhinkle jacobhinkle self-assigned this May 7, 2024
@jacobhinkle jacobhinkle added enhancement New feature or request Matmuls labels May 7, 2024
@kevinstephano kevinstephano removed the enhancement New feature or request label Oct 30, 2024
@kevinstephano
Copy link
Collaborator

Is this still relevant for Hopper?

@jacobhinkle
Copy link
Collaborator Author

Is this still relevant for Hopper?

Yes I believe so but it is very low priority as it only affects unaligned inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants