Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler #3310

rdspring1 · 2024-10-30T05:26:18Z

This PR creates four tests for the HopperMultiMatmulScheduler. Each tests covers a different matmul layout - TT, TN, NT, and NN where the input arguments are already broadcasted.

tests/cpp/test_matmul_scheduler.cpp

rdspring1 · 2024-10-30T18:44:04Z

I created TN test with MNK ordering, added custom MatmulParams, and kept the original NT tests because two tests are better than one.

jacobhinkle

LGTM, although it is a fair amount of code duplication that could be helped with parametrization. Also just a note that we could also test allocation domain here.

jacobhinkle · 2024-11-01T13:19:31Z

tests/cpp/test_matmul_scheduler.cpp

+  auto tv0 = makeContigConcreteTensor({-1, -1, 1}, dtype); // A [M, K, b]
+  auto tv1 = makeContigConcreteTensor({1, -1, -1}, dtype); // B [b, K, N]


In this case the MmaOp input order is MKN, and the output gets reordered with a root->logical reordering to MNK. In the TN case there's no such reordering because the logical order of inputs is MNK. Note that we can also have allocation domains set on the inputs. Maybe we could parametrize all the combinations i.e. the orders of the allocation and logical domains of the inputs?

Maybe we could parametrize all the combinations i.e. the orders of the allocation and logical domains of the inputs?

When the allocation and logical domains are different, would the input operand be not contiguous, concrete tensors?

They can be contiguous, concrete, and have permuted allocation domain. Contiguity is with respect to allocation domain, so e.g. a tensor of logical shape [5, 7] and stride [7, 1] is contiguous, but so is one with logical shape [5, 7] and stride [1, 5]. The latter would correspond to having a swapped allocation domain in nvFuser.

rdspring1 · 2024-11-02T00:35:59Z

!test

This PR modifies `schedulePrologues` to use TMA loads to move mma operands to shared memory. Stacked on #3324 and #3310. ## Details 1. Input operands are loaded into shared memory via `CpAsyncBulkTensorTile` LoadStoreOp. 2. Replace `LdMatrix` operation with basic set. 3. Modified `scheduleOperandSmemStores` to apply swizzling to avoid bank conflicts. 4. Refactor `swizzleSharedMemory` by moving the analysis component to a separate function named `analyzeSwizzleSharedMemory`. 5. Create `tmaSwizzleSharedMemory` function that uses `analyzeSwizzleSharedMemory` and then finds the appropriate tma swizzle format. 6. Disable loop rotation. There is an issue with tma loads and circular buffering. Not sure if loop rotation is required for hopper matmul. 7. Expect hopper matmul tests to give incorrect results.

rdspring1 requested review from jacobhinkle and protonu October 30, 2024 05:26

rdspring1 marked this pull request as ready for review October 30, 2024 05:26

jacobhinkle reviewed Oct 30, 2024

View reviewed changes

tests/cpp/test_matmul_scheduler.cpp Show resolved Hide resolved

tests/cpp/test_matmul_scheduler.cpp Show resolved Hide resolved

tests/cpp/test_matmul_scheduler.cpp Outdated Show resolved Hide resolved

rdspring1 added Matmuls Top-Down Matmul Dev labels Oct 30, 2024

rdspring1 changed the title ~~Add single test for HopperMultipleMatmulScheduler~~ Add TN and NT tests for HopperMultipleMatmulScheduler Oct 30, 2024

rdspring1 requested a review from jacobhinkle October 30, 2024 18:44

rdspring1 added 3 commits October 31, 2024 19:47

Add guard to skip ampere matmul tests on hopper

3ae7baa

create TN and NT tests

1ce800a

Create TT and NT tests

7c8f375

rdspring1 force-pushed the hopper_matmul_tests branch from d8bc1a6 to 7c8f375 Compare November 1, 2024 02:54

rdspring1 changed the base branch from main to ampere_guard November 1, 2024 02:58

rdspring1 changed the title ~~Add TN and NT tests for HopperMultipleMatmulScheduler~~ Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler Nov 1, 2024

rdspring1 mentioned this pull request Nov 1, 2024

Load mma operands to shared memory with TMA #3320

Merged

hopper-only

43b2433

jacobhinkle approved these changes Nov 1, 2024

View reviewed changes

Base automatically changed from ampere_guard to main November 1, 2024 16:28

disable

9956259

rdspring1 merged commit 7086d52 into main Nov 3, 2024
47 checks passed

rdspring1 deleted the hopper_matmul_tests branch November 3, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler #3310

Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler #3310

rdspring1 commented Oct 30, 2024 •

edited

Loading

rdspring1 commented Oct 30, 2024

jacobhinkle left a comment

jacobhinkle Nov 1, 2024

rdspring1 Nov 2, 2024

jacobhinkle Nov 2, 2024 •

edited

Loading

rdspring1 commented Nov 2, 2024

		auto tv0 = makeContigConcreteTensor({-1, -1, 1}, dtype); // A [M, K, b]
		auto tv1 = makeContigConcreteTensor({1, -1, -1}, dtype); // B [b, K, N]

Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler #3310

Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler #3310

Conversation

rdspring1 commented Oct 30, 2024 • edited Loading

rdspring1 commented Oct 30, 2024

jacobhinkle left a comment

Choose a reason for hiding this comment

jacobhinkle Nov 1, 2024

Choose a reason for hiding this comment

rdspring1 Nov 2, 2024

Choose a reason for hiding this comment

jacobhinkle Nov 2, 2024 • edited Loading

Choose a reason for hiding this comment

rdspring1 commented Nov 2, 2024

rdspring1 commented Oct 30, 2024 •

edited

Loading

jacobhinkle Nov 2, 2024 •

edited

Loading