Initial resize scheduler #3556

naoyam · 2024-12-10T05:54:10Z

This is a very preliminary version of a new scheduler mainly targeted for RoPE. I will incrementally extend this scheduler to be more flexible and performant, but for now it only handles a fusion that has pointwise ops and a single Resize-based tensor op such as SliceOp and PadOp. The scheduling strategy is currently pretty naive too and is manually demonstrated at #3549 and #3555, but the main point is that inputs of resize-based tensor ops like SliceOp or PadOp no longer need to have their inputs as fusion inputs.

The new scheduler is currently placed after the reduction scheduler and before the transpose and pointwise schedulers:

SchedulerType::ExprEval,
    SchedulerType::NoOp,
    SchedulerType::Matmul,
    SchedulerType::Reduction,
    SchedulerType::Resize, <-- New
    SchedulerType::Transpose,
    SchedulerType::PointWise,
    SchedulerType::InnerPersistent,
    SchedulerType::OuterPersistent,
    SchedulerType::InnerOuterPersistent};

https://github.com/NVIDIA/Fuser/pull/3556/files#diff-c0d261d44c61935fa2d5398f0ac52bd6ea077c6892fb5629c03a425a55fc32f2R64-R74

There are several small changes with some of the existing tests, mainly those on segmentation and alias support since this new scheduler may change how a fusion is segmented when resize is used. There's one thing I haven't addressed (#3556 (comment)), which I'm tracking with a separate issue.

naoyam · 2024-12-10T22:17:51Z

tests/cpp/test_resize.cpp

@@ -4096,64 +4108,85 @@ TEST_F(ResizeTest, PropagateSliceToInputs) {
  auto tv0 = makeConcreteTensor(shape);
  fusion.addInput(tv0);

-  auto tv1 = set(tv0);
+  // Dont't use set here as it gets taken by the no-op scheduler
+  auto tv1 = sin(tv0);


The changes from set to sin or cos are just to avoid the preseg transformation from kicking in.

naoyam · 2024-12-10T22:21:17Z

tests/cpp/test_resize.cpp

Nothing changed with the tests here (except replacing set with sin and one disabled test) but just extended some of the existing tests to use the resize scheduler as well. Not all patterns are supported yet, so they just call GTEST_SKIP for now.

naoyam · 2024-12-10T22:21:42Z

csrc/scheduler/tools/domain_map.h

This is just moved from pointwise_utils.h

naoyam · 2024-12-10T22:22:14Z

csrc/scheduler/tools/domain_map.cpp

Just moved from pointwise_utils to domain_map

naoyam · 2024-12-10T22:23:57Z

csrc/scheduler/resize.cpp

+
+namespace nvfuser {
+
+bool ResizeScheduler::canScheduleCompileTime(Fusion* fusion) {


In this initial version, I'm trying to make it very restrictive. Will have several follow-up PRs to schedule the whole RoPE module.

naoyam · 2024-12-10T22:25:43Z

csrc/scheduler/pointwise_utils.h

 #include <scheduler/utils.h>

 namespace nvfuser {
 namespace pointwise_utils {

-// DomainMap uses the ComputeAtMap to find a reference TensorView


This part is moved to scheduler/tools/domain_map.h

naoyam · 2024-12-10T22:26:15Z

csrc/scheduler/pointwise.cpp

@@ -29,37 +29,6 @@ namespace {
 // Unused at the moment, commenting for clang tidy
 constexpr int64_t kThreadX = 128;

-class DomainMap : public pointwise_utils::DomainMap {


This part is moved to pointwise_utils.h so that it can be also used from the resize scheduler

naoyam · 2024-12-10T22:26:29Z

csrc/scheduler/pointwise_utils.h

@@ -74,5 +30,44 @@ inline int64_t nRootDims(const TensorView* tv) {
  return tv_n_dims;
 }

+class DomainMap : public scheduler_tools::DomainMap {


This is moved from pointwise.cpp

naoyam · 2024-12-10T23:03:09Z

csrc/scheduler/pointwise.cpp

@@ -432,19 +403,11 @@ std::unique_ptr<PointwiseParams> getPointwiseHeuristics(
  return params;
 }

-// Return reference tensor view.


Just moved to pointwise_utils

naoyam · 2024-12-10T23:03:52Z

csrc/scheduler/pointwise_utils.h

+};
+
+// Return reference tensor view.
+inline TensorView* getReferenceTensor(Fusion* fusion) {


Moved from pointwise.cpp. Also shortened the name a bit (was getReferenceTensorView)

naoyam · 2024-12-11T04:00:01Z

!test

naoyam · 2024-12-11T04:03:14Z

tests/cpp/test_alias.cpp

@@ -520,6 +520,9 @@ TEST_F(AliasTest, AliasOutputBeforeNonAliasOutput) {
  testValidate(
      executor_cache.fusion(), out_tensors, {in_tensor}, __LINE__, __FILE__);

+  // TODO: Fix the alias support


This is broken for now. Need to understand how it actually works before this PR.

naoyam · 2024-12-11T04:03:40Z

tests/cpp/test_alias.cpp

@@ -959,34 +962,6 @@ TEST_F(AliasTest, SourceIsBothInputAndOutput) {
  EXPECT_EQ(in_tensor.data_ptr(), out_tensors[1].data_ptr());
 }

-TEST_F(AliasTest, SegmentBoundary) {


Probably not relevant as this isn't segmented anymore

naoyam · 2024-12-11T04:04:17Z

tests/cpp/test_gpu3.cpp

  const auto num_segments = kernel_runtime->fusionSegments()->groups().size();
-  NVF_CHECK(num_segments == 3, "Expect 3 segments, got: ", num_segments);
-  for (const auto& exec : kernel_runtime->executors()) {
+  EXPECT_EQ(num_segments, 2) << "Expect 2 segments, got: " << num_segments;


This is now just segmented to two kernels

naoyam · 2024-12-11T04:04:42Z

tests/cpp/test_gpu3.cpp

    if (!exec->isA<KernelExecutor>()) {
      continue;
    }
+    if (kernel_runtime->schedulerHeuristics()


The gmem requirement isn't relevant for the resize scheduler

… enable_id_model_for_resize

csrc/scheduler/resize.cpp

jacobhinkle

LGTM

csrc/scheduler/resize.cpp

jacobhinkle · 2024-12-12T14:27:01Z

csrc/scheduler/resize.cpp

+  const auto outermost_pos = (int64_t)old2new.size();
+  ref_tv->flatten(outermost_pos);
+  ref_tv->split(outermost_pos, 128);
+  ref_tv->split(outermost_pos, 1 << 14);


This is a little off-topic but is there any particular reason to split by 16K here? In the pointwise scheduler I think the split is into 64K blocks. Would it be better to just make this a heuristic param so that it could be set to some multiple of the number of SMs?

Nothing particular. This is just a random simple scheduling I put here for now. These heuristics parameters would need to go through the tuning process once all the building blocks are in place (much like the matmul scheduler development).

jacobhinkle · 2024-12-12T14:29:19Z

The multiple uses of the name DomainMap might be a little confusing even though they're in different namespaces/files. You could call one a PointwiseDomainMap or something.

naoyam · 2024-12-12T18:37:42Z

The multiple uses of the name DomainMap might be a little confusing even though they're in different namespaces/files. You could call one a PointwiseDomainMap or something.

Yeah, I thought so too, but they are also in its own namespace like pointwise_utils, so it would look like pointwise_utils::PointwiseDomainMap, which I thought may look redundant.

csrc/scheduler/resize.cpp

jjsjann123

mechanical changes looks straightforward to me. Taking a look at the actual scheduler + tests.

csrc/scheduler/pointwise_utils.cpp

csrc/scheduler/resize.cpp

jjsjann123 · 2024-12-13T00:49:24Z

csrc/scheduler/resize.cpp

+
+  inlineMost();
+
+  markAliases(fusion);


tagging @wujingyue , I see this one in pointwise / reduction / no-op scheduler.

shouldn't this be called after any schedule is done in general?

naoyam · 2024-12-13T20:29:45Z

The multiple uses of the name DomainMap might be a little confusing even though they're in different namespaces/files. You could call one a PointwiseDomainMap or something.

Yeah, I thought so too, but they are also in its own namespace like pointwise_utils, so it would look like pointwise_utils::PointwiseDomainMap, which I thought may look redundant.

I think redundancy is better than ambiguity, so I changed the name as you suggested @jacobhinkle

…bolicSizes) (#3578) Stacked on #3585 `StmtSort::getStmtsTo` may not grab all active iter domains if IDs are connected in an unconventional way. For example, we can set the loop domain of a tensor as a producer of its logical domain, but due to the nature of `IterVisitor`, such ID dependency patterns are not supported, meaning `StmtSort::getStmtsTo` would fail to grab all valid IDs and their exprs. I just recently noticed this issue while working on #3556, specifically the issue got exposed as an inconsistent replacement of extent vals. I've been experimenting such patterns of domains, but I hadn't seen this before, likely because I was using just static shape tensors for convenience. To fix the issue, I added a variation of `StmtSort::getStmtsTo`, which traverses a fusion as usual but stops at TensorView. For each TensorView, instead of using `IterVisitor`, it uses `TensorDomain::getAllStatements()`, which combines both `TensorDomain::allIDs()` and `TensorDomain::allExprs()`, and traverse the IDs and exprs in the returned order. It's a bit naive implementation, but I think this is good enough for now and also I don't have any other immediate idea to try. I changed `ValReplacementMutator` to use the new interface. That's the only use for now. --------- Co-authored-by: Jacob Hinkle <[email protected]>

…ial_version

naoyam · 2024-12-13T22:41:54Z

!test

naoyam · 2024-12-13T23:52:11Z

!test

naoyam · 2024-12-15T16:14:41Z

!test

naoyam · 2024-12-15T17:04:38Z

!test

naoyam force-pushed the resize_scheduler_initial_version branch 2 times, most recently from 5bde3d4 to 7e7db61 Compare December 10, 2024 20:05

naoyam commented Dec 10, 2024

View reviewed changes

csrc/scheduler/tools/domain_map.h Outdated

Copy link

Collaborator Author

naoyam Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved from pointwise_utils.h

naoyam commented Dec 10, 2024

View reviewed changes

csrc/scheduler/tools/domain_map.cpp Outdated

Copy link

Collaborator Author

naoyam Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved from pointwise_utils to domain_map

naoyam commented Dec 10, 2024

View reviewed changes

Base automatically changed from rotation_residual_support to main December 10, 2024 22:46

naoyam commented Dec 10, 2024

View reviewed changes

naoyam added 2 commits December 10, 2024 17:35

Always enable IdModel-based indexing when resize is used

11f5dce

Don't run the tests without IdModel

05ea88f

naoyam commented Dec 11, 2024

View reviewed changes

naoyam mentioned this pull request Dec 11, 2024

Alias support with the resize scheduler #3572

Closed

naoyam added 7 commits December 10, 2024 22:11

fix

0ad9fea

Allocation ordering fix

4f14988

Merge remote-tracking branch 'origin/enable_id_model_for_resize' into…

f3ce2d9

… enable_id_model_for_resize

rotation + residual

0d35147

wip

7934e63

move DomainMap to its own file

9e71bc5

Use the reference finder of pointwise scheduler

57600bd

naoyam force-pushed the resize_scheduler_initial_version branch from 4ad2ff7 to e8cb381 Compare December 11, 2024 09:22

naoyam changed the base branch from main to enable_id_model_for_resize December 11, 2024 09:22

naoyam added 2 commits December 11, 2024 13:42

Fix the failed alias test thanks to @wujingyue

c80dd91

cleanup

9167cf0

wujingyue approved these changes Dec 11, 2024

View reviewed changes

wujingyue reviewed Dec 11, 2024

View reviewed changes

csrc/scheduler/resize.cpp Outdated Show resolved Hide resolved

naoyam added 2 commits December 11, 2024 15:14

cleanup

df63df2

cleanup

52acb42

jacobhinkle approved these changes Dec 12, 2024

View reviewed changes

wujingyue approved these changes Dec 12, 2024

View reviewed changes

csrc/scheduler/resize.cpp Outdated Show resolved Hide resolved

jjsjann123 reviewed Dec 12, 2024

View reviewed changes

csrc/scheduler/pointwise_utils.cpp Outdated Show resolved Hide resolved

jjsjann123 reviewed Dec 13, 2024

View reviewed changes

naoyam mentioned this pull request Dec 13, 2024

Grab all IDs and exprs with StmtSort::getAllStmts (and fix replaceSymbolicSizes) #3578

Merged

naoyam and others added 5 commits December 12, 2024 17:50

Merge branch 'main' into resize_scheduler_initial_version

6363298

Merge branch 'main' into resize_scheduler_initial_version

ca09b93

PR feedback

0d0a4d6

fix

be3aee9

Rename DomainMap to PointwiseDomainMap

4368e80

naoyam added 2 commits December 13, 2024 12:54

Merge remote-tracking branch 'origin/main' into resize_scheduler_init…

2a6f059

…ial_version

Merge remote-tracking branch 'origin/main' into resize_scheduler_init…

91e7d3e

…ial_version

merge fix

96ac0fa

naoyam added 2 commits December 15, 2024 08:13

python frontend fix

7e9413a

fix pattern match

40dd2c2

fix

8056cfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial resize scheduler #3556

Initial resize scheduler #3556

naoyam commented Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024

naoyam Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam Dec 10, 2024

naoyam commented Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

naoyam Dec 11, 2024

jacobhinkle left a comment

jacobhinkle Dec 12, 2024

naoyam Dec 13, 2024

jacobhinkle commented Dec 12, 2024

naoyam commented Dec 12, 2024

jjsjann123 left a comment

jjsjann123 Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024


		namespace nvfuser {

		bool ResizeScheduler::canScheduleCompileTime(Fusion* fusion) {

Initial resize scheduler #3556

Are you sure you want to change the base?

Initial resize scheduler #3556

Conversation

naoyam commented Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

naoyam Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobhinkle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobhinkle commented Dec 12, 2024

naoyam commented Dec 12, 2024

jjsjann123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 13, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 15, 2024

naoyam commented Dec 10, 2024 •

edited

Loading

naoyam Dec 10, 2024 •

edited

Loading