Add slice tests to demonstrate manual scheduling #2898

naoyam · 2024-09-04T02:47:18Z

Slice and concat patterns without rotation. See #2851.

parallelization

jjsjann123 · 2024-09-04T20:08:55Z

csrc/id_model/indexing_traversal.cpp

+    auto resize = dynamic_cast<Resize*>(id->definition());
+    if (resize == nullptr) {
+      continue;
+    }


nitpick:

if (auto resize = dynamic_cast<Resize*>(id->definition()) {

jjsjann123 · 2024-09-04T20:29:49Z

tests/cpp/test_resize.cpp

+      IrBuilder::create<Val>(-1, DataType::Index),
+      IrBuilder::create<Val>(-1, DataType::Index));
+
+  tv2->setLoopDomain({tv2_loop_id});


naive question: why are we manually adding the Resize with (-1, -1) here? Wasn't tv2->getLogicalDomain().at(0) already equivalent to tv1->getLogicalDomain().at(0) and hence can't we just re-use the loop domain as tv2->setLoopDomain(tv1->getLogicalDomain())?

jjsjann123 · 2024-09-04T20:44:17Z

tests/cpp/test_resize.cpp

+      tv1->getLoopDomain().at(0),
+      IrBuilder::create<Val>(1, DataType::Index),
+      IrBuilder::create<Val>(1, DataType::Index));
+  tv1->setLoopDomain({tv1_padded});


I sorta see the need to create tv1_padded, Just for clarification, is there a requirement about the dependency of LoopDomain? i.e. can we use the domain of its consumer as the loop domain of the producer?

i.e. tv1->setLoopDomain(tv2->getLogicalDomain())

As discussed offline, it's because each tensor needs its own domains and exprs, at this moment.

naoyam · 2024-09-05T05:32:05Z

csrc/device_lower/validation.cpp

-void validateResize(Fusion* fusion) {
-  auto fusion_vals = fusion->usedMathVals();
-  for (auto tv : ir_utils::filterByType<TensorView>(fusion_vals)) {
-    // Make sure resize is only used as part of root to logical transformations


No longer holds

naoyam · 2024-09-05T05:36:01Z

!build

naoyam · 2024-09-05T08:33:54Z

!build

jjsjann123 · 2024-09-05T11:58:16Z

tests/cpp/test_resize.cpp

+
+  fusion.addOutput(tv3);
+
+  tv2->setLoopDomain(tv2->getRootDomain());


Just to clarify here, we would want tv2's loop domain to be identical to tv3 after the pad?

Here we can reuse the root domain, simply because the pad cancels out the slice.
So if we were to have a pad with a different pad widths, we'll want to clone tv3->getLogicalDomain(), and then a manual Resize to establish the pad width afterwards?

In this case, we can use the logical domain of the input or the logical domain of the output. Both are equivalent, but that equivalence relationship is not detected by IdModel. Adding some analysis to detect that relationship is part of the overall plan, but it isn't strictly required.

In this test, I'm using the input logical domain, which is mapped with the tv1 logical and the tv2 root domains. Since each tensor needs to use its own IDs,

tv2->setLoopDomain(tv2->getRootDomain())

is the simplest way to set the loop domain of tv2.

If the padding width of tv3 is different, then it'd be unclear what loop domain should be used, especially for the right-half case.

If the padding width of tv3 is different, then it'd be unclear what loop domain should be used, especially for the right-half case.

Does this mean that we require the final output of the fusion to be of the same shape (for the slice/pad dimension) with the original input with a slices+pads chain?

Is this a requirement on patterns we support? This feels a big strange to have, i.e. looking at the pad only test above PadScheduledLikeConsumer, we don't have such restriction there.
Moving forward the chains of slice/pad(concat) below, how would we parse this restriction in that case, is the restriction translate to that each matching pair of slice/pad needs to cancel each other?

If it's a pad only, it should be fine. But if there's a slice and then a pad, there will be:

input domain

sliced domain of the input domain

padded domain of the sliced domain

The requirement is that #1 and #3 correspond to the same domain. If not, it's unclear what domain to generate a loop for it. Technically, it should be a domain that has an extent of max(#1, #3), but we don't have a capability like this.

If there's only a pad,

input domain

padded domain of the input domain

Then we can just use #2 as it's larger than #1.

jjsjann123 · 2024-09-05T12:07:00Z

tests/cpp/test_resize.cpp

+        left_half,
+        fusion.zeroVal(),
+        IrBuilder::create<Val>(shape[0] / 2, DataType::Index));
+    tv6->setLoopDomain(tv6_loop);


I'm not quite following what's happening here. Looks like we replay the left_half transform here (slice_resize -> pad_resize). My naive questions:

Why do we need to have this replay in tv6_loop? since the slice+pad cancelled out each other, can't we just use tv2->getRootDomain()[0] directly?

Follow up question is, if we do need to replay the transform for some reason, why can we just ignore the right half?

Are you asking why this won't work?

std::vector<IterDomain*> tv6_loop{ tv2->getRootDomain()[0]->cloneWithoutRFactor(), }; tv6->setLoopDomain(tv6_loop);

This won't work as this tv6_loop domain is just a copy of the tv2 domain and is not connected with anything yet, i.e., it doesn't have any definition or use expr. We need to define how it is related to the existing domains of tv6.

To connect the domain to tv6, we can use either the left or right half. It won't matter. I should have mentioned that by a comment.

Please bear with more of my naive questions.

a copy of the tv2 domain and is not connected with anything yet, i.e., it doesn't have any definition or use expr.

I thought tv2->getRootDomain() is consumed by a resize to create its logical domain. Since we have
auto tv2 = slice(tv1, {{fusion.zeroVal(), IrBuilder::create<Val>(shape[0] / 2)}});

We need to define how it is related to the existing domains of tv6

since we have a chain of slice -> pad -> binary_add. With exact graph, if we view the IterDomain as a ValGroup, I think the connection is well established.

Of course this is just theoretical discussion ^^^.

We need to define how it is related to the existing domains of tv6 -> Does it mean our analysis today requires the chain of relationship needs to be resolved within the IterDomains of a single TensorDomain?

I thought tv2->getRootDomain() is consumed by a resize to create its logical domain. Since we have
auto tv2 = slice(tv1, {{fusion.zeroVal(), IrBuilder::create(shape[0] / 2)}});

That is true, but tv2->getRootDomain()[0]->cloneWithoutRFactor() is not. And since it's just a plain copy, it isn't mapped with anything.

since we have a chain of slice -> pad -> binary_add. With exact graph, if we view the IterDomain as a ValGroup, I think the connection is well established.

Again, that's true with tv2->getRootDomain()[0] but not with its clone.

Does it mean our analysis today requires the chain of relationship needs to be resolved within the IterDomains of a single TensorDomain?

Yes and no, depending on what you're trying to get from the analysis. In this case, however, since the clone is not mapped with anything, there's nothing we can get from it without connecting it to other domains through replays.

but tv2->getRootDomain()[0]->cloneWithoutRFactor() is not. And since it's just a plain copy

🤯 Ha, I see what I'm missing now. Thanks a lot for walking me through this 🙇

jjsjann123 · 2024-09-05T12:07:32Z

tests/cpp/test_resize.cpp

+}
+
+// RoPE pattern except for the rotation
+TEST_F(ResizeTest, SliceSliceConcatConcat) {


Note for myself, one last example to go through.

jjsjann123

Thanks a lot for walking me through these examples!

naoyam added 4 commits September 3, 2024 17:31

Further WAR for indexing resize

d0165ac

cleanup

d4ee99c

comment

79165ae

Add slice tests to demonstrate manual scheduling

6c49f21

naoyam added the rope label Sep 4, 2024

naoyam requested a review from jjsjann123 September 4, 2024 02:47

Base automatically changed from idmodel_resize_indexing_war to main September 4, 2024 17:54

naoyam added 9 commits September 4, 2024 12:07

Allow silently ignore missing root-to-logical ops in BestEffortReplay

021d873

Use loop promotion and indexing traversal graph to find mismatched

c335486

parallelization

enable idmodel

84ff7a8

fix

1bff20d

format

1362221

cleanup

c5adea4

cleanup

e603106

format

1371a85

cleanup and enable the previously-failing reshape parallelization

72ae077

jjsjann123 reviewed Sep 4, 2024

View reviewed changes

naoyam added 5 commits September 4, 2024 21:53

fix

708555c

Merge branch 'main' into sync_info_idmodel

b686a7b

Merge branch 'main' into slice_manual_scheduling_tests

b45e366

Merge branch 'sync_info_idmodel' into slice_manual_scheduling_tests

c76487d

Fix and enable parallelization

358d7c6

naoyam changed the base branch from main to sync_info_idmodel September 5, 2024 05:30

naoyam commented Sep 5, 2024

View reviewed changes

Requires explicit optin to use IdModel

e475b15

Base automatically changed from sync_info_idmodel to main September 5, 2024 16:34

jjsjann123 reviewed Sep 5, 2024

View reviewed changes

jjsjann123 approved these changes Sep 5, 2024

View reviewed changes

naoyam added 2 commits September 5, 2024 14:49

Merge branch 'main' into slice_manual_scheduling_tests

d7ab0bf

disable idmodel

150668c

naoyam merged commit 0058da9 into main Sep 5, 2024
5 checks passed

naoyam deleted the slice_manual_scheduling_tests branch September 5, 2024 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slice tests to demonstrate manual scheduling #2898

Add slice tests to demonstrate manual scheduling #2898

naoyam commented Sep 4, 2024 •

edited

Loading

jjsjann123 Sep 4, 2024

jjsjann123 Sep 4, 2024

jjsjann123 Sep 4, 2024

naoyam Sep 5, 2024

naoyam Sep 5, 2024

naoyam commented Sep 5, 2024

naoyam commented Sep 5, 2024

jjsjann123 Sep 5, 2024

naoyam Sep 5, 2024

jjsjann123 Sep 5, 2024

naoyam Sep 5, 2024

jjsjann123 Sep 5, 2024

naoyam Sep 5, 2024

jjsjann123 Sep 5, 2024

naoyam Sep 5, 2024

jjsjann123 Sep 5, 2024

jjsjann123 Sep 5, 2024

jjsjann123 left a comment


		fusion.addOutput(tv3);

		tv2->setLoopDomain(tv2->getRootDomain());

Add slice tests to demonstrate manual scheduling #2898

Add slice tests to demonstrate manual scheduling #2898

Conversation

naoyam commented Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam commented Sep 5, 2024

naoyam commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjsjann123 left a comment

Choose a reason for hiding this comment

naoyam commented Sep 4, 2024 •

edited

Loading