Multi-matmul scheduler: add test and schedule smem operand store #2913

jacobhinkle · 2024-09-05T23:53:13Z

This PR follows up from #2719 toward the eventual goal of merging #2458. Here I am introducing a parametrized test suite that I will use to check that the generated code matches the current scheduler for a variety of scenarios. I also am scheduling the first consumer tensor in the fusion: the smem store of operands. This shows how we can selectively check that the schedule is correct before we have everything implemented, since the test is only checking these tensors and nothing else in the fusion.

In the next PRs, I will schedule the rest of the prologue, as well as the mma result and the epilogue tensors. In each case, I will change the test to check more and more of the fusion for correctness.

This PR introduces AbstractMatmuTensor, an AbstractTensor for which each dimension can be tagged with a role. This lets us track the roles of each dimension during abstract tensor scheduling. This is only currently used in blockTileTensors and mma_utils::makeTile, but in the future we can imagine using it more.

jacobhinkle · 2024-09-05T23:57:18Z

csrc/scheduler/multi_matmul.cpp

+//! domain must be set as loop domain. For the case of new swizzle, this domain
+//! must be set as allocation domain.
+template <bool legacy = true>
+AbstractTensor swizzleSharedMemory(TensorView* shared_mem_tv) {


Copied straight from matmul.cpp

jacobhinkle · 2024-09-05T23:57:32Z

!build

jacobhinkle · 2024-09-06T00:01:20Z

tests/cpp/test_multi_matmul_scheduler.cpp

+        // Take the consumer of each input, which is the smem store
+        tvs.push_back(v->uses().at(0)->output(0)->as<TensorView>());


Currently this just checks the smem store. But compareTvs is recursive so we can change this to add the smem loads next, then the MmaOp outputs, and eventually the fusion outputs, in order to check every tensor in the fusion.

jacobhinkle · 2024-09-06T13:49:56Z

!build

csrc/id_model/schedule.cpp

zasdfgbnm · 2024-09-06T20:15:30Z

tests/cpp/test_multi_matmul_scheduler.cpp

+    ASSERT_FALSE(testing::Test::HasFailure()) << suffix;
+  }
+
+  void compareSchedules() {


What is the future plan of this check? I don't think in the long term we want to keep the legacy scheduler just for testing purposes. Will we completely removing those tests?

Yes. Once we remove the old scheduler we will remove this file.

csrc/scheduler/multi_matmul.cpp

csrc/abstract_tensor.h

…_smem_store

This generalizes `AbstractTensor` to the templated struct `AbstractTensorWithInfo<Info>`. Related to #2913.

This generalizes `AbstractTensor` to the templated struct `AbstractTensorWithInfo<Info>` and introduces a special case subclass `TaggedAbstractTensor<Tag>`. This can be used by passing an enum class for `Tag`, and holds an `unordered_set<Tag>` for each dimension. Merging and swizzling unions these sets, and split duplicates the set. Note that a lot of code had to be moved out of the cpp into the header because of templatization. However, there are no changes to the `Dispatch*` classes. The `AbstractTensorWithInfo` methods like split, merge, swizzle, etc. are just changed to add calls to `Info::merge`. Related to #2913, which specializes this as `using AbstractMatmulTensor = TaggedAbstractTensor<MatmulDimRole>`.

…_smem_store

jacobhinkle · 2024-09-18T18:58:55Z

csrc/scheduler/multi_matmul.cpp

+  checkConcreteStaticDim(swizzle_domain[-2]);
+  checkConcreteStaticDim(swizzle_domain[-1]);


Implemented checkConcreteStaticDim so we don't need to convert to IterDomain* for this call now.

jacobhinkle · 2024-09-18T19:18:50Z

!build

I will investigate switching to BROADCAST separately but for now that is breaking mma_utils::canonicalDimOrdering so I'm disabling it.

jacobhinkle · 2024-09-19T00:02:16Z

!build

…_smem_store

zasdfgbnm · 2024-09-20T16:56:55Z

csrc/scheduler/mma_utils.cpp

@@ -1128,6 +1210,28 @@ std::vector<MatmulDimRole> canonicalizeMmaTvOrdering(
  return roles;
 }

+void mergeConsecutiveAxesWithSameRole(


Will we be also interested in merging non-consecutive axes for more flexibility?

This will pick up that case too. The consecutive dims are consecutive after a reordering so the original order can be anything. After we merge the refactor I want to add some tests with exotic multiple dim combinations to start exercising this code more.

csrc/scheduler/multi_matmul.cpp

zasdfgbnm · 2024-09-20T17:03:15Z

csrc/scheduler/multi_matmul.cpp

@@ -98,25 +534,22 @@ class MultipleMatmulScheduler {
    // and dimension roles for all tensors in the fusion
    findPatterns();
    translatePatterns();
-    // translatePatterns changes the TensorView graph, so we build the IdModel
-    // afterward
-    buildIdModel();
    findRoles();


Does findRoles need IdModel, or it is already updated somewhere before?

It is used by findRoles() but it is updated at the end of translatePatterns() now. I moved it there to make the logic in this scope a bit clearer.

zasdfgbnm · 2024-09-20T18:20:58Z

tests/cpp/test_multi_matmul_scheduler.cpp

+  }
+
+  // Recursively compare scalar values
+  void compareScalars(Val* v_orig, Val* v_new) {


Can this be simplified as:

return cloner_->clone(v_orig)->sameAs(v_new);

If the above isn't just working, then don't bother investing more time to make it work. This is a test that will throw out in the future, so I don't really care about having a clean and elegant implementation.

This would work if we had cloned the scheduled fusion, but here I have cloned the unscheduled fusion then scheduled them both. That means that some of these scalars are new in both the original and the clone. When I clone those over using cloner_->clone(v_orig) we wind up with a new undefined scalar, which fails the sameAs check if the original had a definition.

`main` is currently not compiling due to the changes introduced in #2913. There was a change on main that I didn't see between when CI passed and when I merged. This PR fixes that.

jacobhinkle added 3 commits September 5, 2024 19:52

Multi-matmul scheduler: add test and schedule smem operand store

5995c82

Use updated swizzleSharedMemory

ecefe56

Update comment for swizzleSharedMemory

f0f6702

jacobhinkle commented Sep 5, 2024

View reviewed changes

jacobhinkle commented Sep 6, 2024

View reviewed changes

Avoid comparing bias tensors for now

90b95a7

jacobhinkle requested a review from zasdfgbnm September 6, 2024 14:00

jacobhinkle marked this pull request as ready for review September 6, 2024 14:00

clang-tidy mma_utils.cpp

64f56ff

zasdfgbnm reviewed Sep 6, 2024

View reviewed changes

csrc/id_model/schedule.cpp Outdated Show resolved Hide resolved

zasdfgbnm reviewed Sep 6, 2024

View reviewed changes

csrc/scheduler/multi_matmul.cpp Outdated Show resolved Hide resolved

zasdfgbnm reviewed Sep 6, 2024

View reviewed changes

csrc/abstract_tensor.h Outdated Show resolved Hide resolved

csrc/abstract_tensor.h Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into multi_matmul_schedule…

f073c79

…_smem_store

jacobhinkle added a commit that referenced this pull request Sep 11, 2024

Introduce TaggedAbstractTensor

133f2b6

This generalizes `AbstractTensor` to the templated struct `AbstractTensorWithInfo<Info>`. Related to #2913.

jacobhinkle mentioned this pull request Sep 11, 2024

Introduce TaggedAbstractTensor #2933

Merged

jacobhinkle added 11 commits September 18, 2024 10:29

Revert changes to abstract_tensor.h

0813c3c

Merge remote-tracking branch 'origin/main' into multi_matmul_schedule…

a1c08ad

…_smem_store

Undo botched merge

3c3aeec

Improve code reuse in rewritten makeTile

eb24916

Rename mergeAxesWithSameRole and add comments

9e57930

Move compareSchedules to TearDown()

3bbe314

Use std::tie to unpack params

e9e445d

Remove unused DisableOptionsGuard

e64b63a

Skip tests with message that this is temporary

ea33ecf

Remove unused include

ae89a97

Remove .as<IterDomain*>() in call to checkConcreteStaticDim

9bd34fa

jacobhinkle commented Sep 18, 2024

View reviewed changes

jacobhinkle added 2 commits September 18, 2024 15:04

Use broadcast graph. Improve comments

dd6b038

Improve docstrings

b43651a

jacobhinkle requested a review from zasdfgbnm September 18, 2024 19:10

jacobhinkle mentioned this pull request Sep 18, 2024

Multiple matmul fusion example #2458

Closed

Switch back to using PERMISSIVE graph

48a9079

I will investigate switching to BROADCAST separately but for now that is breaking mma_utils::canonicalDimOrdering so I'm disabling it.

jacobhinkle added 2 commits September 19, 2024 10:06

Merge remote-tracking branch 'origin/main' into multi_matmul_schedule…

7fa7447

…_smem_store

Fix failure on Hopper

dcef0ad

jacobhinkle mentioned this pull request Sep 19, 2024

Convert AbstractTensor from struct to class #2967

Merged

Merge branch 'main' into multi_matmul_schedule_smem_store

264c0aa

zasdfgbnm approved these changes Sep 20, 2024

View reviewed changes

jacobhinkle merged commit bc3ddae into main Sep 23, 2024
5 checks passed

jacobhinkle deleted the multi_matmul_schedule_smem_store branch September 23, 2024 11:49

jacobhinkle mentioned this pull request Sep 23, 2024

Fix compile error in multi-matmul scheduler #2990

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-matmul scheduler: add test and schedule smem operand store #2913

Multi-matmul scheduler: add test and schedule smem operand store #2913

jacobhinkle commented Sep 5, 2024 •

edited

Loading

jacobhinkle Sep 5, 2024

jacobhinkle commented Sep 5, 2024

jacobhinkle Sep 6, 2024

jacobhinkle commented Sep 6, 2024

zasdfgbnm Sep 6, 2024

jacobhinkle Sep 6, 2024

jacobhinkle Sep 18, 2024

jacobhinkle commented Sep 18, 2024

jacobhinkle commented Sep 19, 2024

zasdfgbnm Sep 20, 2024

jacobhinkle Sep 20, 2024

zasdfgbnm Sep 20, 2024

jacobhinkle Sep 22, 2024

zasdfgbnm Sep 20, 2024

jacobhinkle Sep 22, 2024

		// Take the consumer of each input, which is the smem store
		tvs.push_back(v->uses().at(0)->output(0)->as<TensorView>());

		checkConcreteStaticDim(swizzle_domain[-2]);
		checkConcreteStaticDim(swizzle_domain[-1]);

Multi-matmul scheduler: add test and schedule smem operand store #2913

Multi-matmul scheduler: add test and schedule smem operand store #2913

Conversation

jacobhinkle commented Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

jacobhinkle commented Sep 5, 2024

Choose a reason for hiding this comment

jacobhinkle commented Sep 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobhinkle commented Sep 18, 2024

jacobhinkle commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobhinkle commented Sep 5, 2024 •

edited

Loading