Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

rdspring1 · 2024-11-01T02:58:03Z

This PR adds NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD to the Ampere matmul tests because the Hopper MultiMatmulScheduler will not support them.

rdspring1 · 2024-11-01T02:58:20Z

!build

jacobhinkle

LGTM.

This PR modifies `schedulePrologues` to use TMA loads to move mma operands to shared memory. Stacked on #3324 and #3310. ## Details 1. Input operands are loaded into shared memory via `CpAsyncBulkTensorTile` LoadStoreOp. 2. Replace `LdMatrix` operation with basic set. 3. Modified `scheduleOperandSmemStores` to apply swizzling to avoid bank conflicts. 4. Refactor `swizzleSharedMemory` by moving the analysis component to a separate function named `analyzeSwizzleSharedMemory`. 5. Create `tmaSwizzleSharedMemory` function that uses `analyzeSwizzleSharedMemory` and then finds the appropriate tma swizzle format. 6. Disable loop rotation. There is an issue with tma loads and circular buffering. Not sure if loop rotation is required for hopper matmul. 7. Expect hopper matmul tests to give incorrect results.

Add guard to skip ampere matmul tests on hopper

3ae7baa

rdspring1 added Matmuls Top-Down Matmul Dev labels Nov 1, 2024

rdspring1 changed the title ~~Add cuda arch guard to skip ampere matmul tests on Hopper gpus~~ Add cuda arch guard to skip ampere matmul tests on Hopper GPUs Nov 1, 2024

rdspring1 mentioned this pull request Nov 1, 2024

Load mma operands to shared memory with TMA #3320

Merged

rdspring1 requested review from jacobhinkle and protonu November 1, 2024 03:09

jacobhinkle approved these changes Nov 1, 2024

View reviewed changes

rdspring1 merged commit f08bd51 into main Nov 1, 2024
40 of 41 checks passed

rdspring1 deleted the ampere_guard branch November 1, 2024 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

rdspring1 commented Nov 1, 2024

rdspring1 commented Nov 1, 2024

jacobhinkle left a comment

Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

Add cuda arch guard to skip ampere matmul tests on Hopper GPUs #3324

Conversation

rdspring1 commented Nov 1, 2024

rdspring1 commented Nov 1, 2024

jacobhinkle left a comment

Choose a reason for hiding this comment