Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

rdspring1
Copy link
Collaborator

This PR adds NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD to the Ampere matmul tests because the Hopper MultiMatmulScheduler will not support them.

@rdspring1 rdspring1 changed the title Add cuda arch guard to skip ampere matmul tests on Hopper gpus Add cuda arch guard to skip ampere matmul tests on Hopper GPUs Nov 1, 2024
@rdspring1
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rdspring1 rdspring1 merged commit f08bd51 into main Nov 1, 2024
40 of 41 checks passed
@rdspring1 rdspring1 deleted the ampere_guard branch November 1, 2024 16:28
rdspring1 added a commit that referenced this pull request Nov 8, 2024
This PR modifies `schedulePrologues` to use TMA loads to move mma
operands to shared memory. Stacked on
#3324 and
#3310.

## Details
1. Input operands are loaded into shared memory via
`CpAsyncBulkTensorTile` LoadStoreOp.
2. Replace `LdMatrix` operation with basic set.
3. Modified `scheduleOperandSmemStores` to apply swizzling to avoid bank
conflicts.
4. Refactor `swizzleSharedMemory` by moving the analysis component to a
separate function named `analyzeSwizzleSharedMemory`.
5. Create `tmaSwizzleSharedMemory` function that uses
`analyzeSwizzleSharedMemory` and then finds the appropriate tma swizzle
format.
6. Disable loop rotation. There is an issue with tma loads and circular
buffering. Not sure if loop rotation is required for hopper matmul.
7. Expect hopper matmul tests to give incorrect results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants