Add initial_loop_domain_ to TensorDomain #2987

naoyam · 2024-09-21T20:23:31Z

It turned out it's necessary to remember the initial loop domain, which is either the logical domain set by the constructor or the new loop domain set by setLoopDomain. When a loop domain has an extra ID, TensorDomain::allIDs may miss IDs that solely depend on the extra ID.

naoyam · 2024-09-21T20:23:37Z

!build

csarofeen · 2024-09-22T18:45:16Z

When a loop domain has an extra ID

When does that happen?

naoyam · 2024-09-23T15:50:54Z

I'm still prototyping, but, for example:

t0 = [i0, i1] // input
t1 = [i0] // input

t2 = t0[:, n:] // slice t0 to [i0, i2] 
t3 = broadcast(t1, {false, true}) // [i0, b3]
t4 = t2 + t3 // [i0, i2]

The producer-based scheduling would use the loop domain of t0, [i0, i1], as the reference loop domain. For t1, I think it'd be the most natural to create a new broadcast ID and set its loop domain as [i0, b4]. This b4 could be also exactly mapped with b3.

zasdfgbnm · 2024-09-23T16:41:18Z

csrc/ir/nodes.cpp

@@ -3614,6 +3617,7 @@ void TensorDomain::setLoopDomain(std::vector<IterDomain*> new_loop_domain) {
      ". Logical: ",
      toDelimitedString(logical_domain_));
  loop_domain_ = std::move(new_loop_domain);
+  initial_loop_domain_ = loop_domain_;


Should we just append everything in loop_domain_ here to additional_ids_? In allIDs, we might want to do:

for (auto i : c10::irange(all_domains.size() - 1)) { for (auto j : c10::irange(all_domains.size() - 1)) {

instead of

for (auto i : c10::irange(all_domains.size() - 1)) { for (auto j : c10::irange(i + 1, all_domains.size())) {

I did think about it, but since additional_ids_ are just "additional", it seems to me it could be useful in some cases. For example, we could cache all IDs between logical and initial_loop to speed up TensorDomain::allIDs.

additional_ids_ was the mechanism designed to allow loop domain to have a new broadcast, and the reason why it exists was to solve almost exactly the same problem as this PR solves. Moving forward, looks like we will not continue with the "allow loop domain to have a new broadcast" direction. But I don't want to have two mechanisms for the same thing. We should pick either of them, but only one should be picked. It makes sense to speed up TensorDomain::allIDs, and it might also make sense to call the data structure we use for the speedup additional_ids_, but it will be a new thing that is totally unrelated to today's additional_ids_ except they happen to have the same name.

Which mechanism to pick I believe is mostly a question of interface design. Today, we have:

void TensorDomain::broadcast(int64_t axis, Val* extent) { axis = nvfuser::wrapDim(axis, nDims() + 1); IterDomain* id = IterDomainBuilder(fusion()->zeroVal(), extent) .iter_type(IterType::Broadcast) .build(); loop_domain_.insert(loop_domain_.begin() + axis, id); additional_ids_.push_back(id); }

That creates a new broadcast on the loop domain. But if we just do setLoopDomain(something with manually created new broadcasts), we will fail.

And in this PR, we are not using an interface like TensorDomain::broadcast, so it will not have the problem of the above mechanism. But if we just do a setLoopDomain(getLoopDomain()), we will fail.

In summary, I believe neither the additional_ids_ mechanism nor the one in this PR is safe. The additional_ids_ is more convenient for the TensorDomain::broadcast-like interface, that is, we have a method function that says "add this new ID to this axis". And the one in this PR is more convenient for setLoopDomain way of inserting new IDs.

I think my question for now is: do we want to pause a bit to think about interface design, or do we want to just quickly unblock for now and do some cleanup later?

Per our offline discussion, we decided that it's more important to move forward and get some progress before too much worrying about interface designs. See also #3000

…n-conventional loop domains (#2983) Part of #2902. See the linked design doc for the background context. ~Stacked on #2984~ Stacked on #2987 The IdModel-based analysis is required for non-conventional loop domains as the dependency from the logical to loop domains may not be just one way from logical to loop. However, the IdModel-based analysis does not support broadcast forwarding, so we can't just completely switch to the new method. For now, I think it makes most sense to use the existing method whenever logical and loop domains are generated in the conventional scheduling primitives such as split and merge. The new method is only used otherwise. I also considered implementing the broadcast forwarding in the IdModel-based approach as well, but it doesn't seem to me worthwhile doing so at this moment. I thought maybe using the Permissive graph would make it trivial, but there seem to be (not a small number of) subtle corner cases of mappings that may not be considered in the Permissive graph, e.g., if the indexed domains should be mapped or not. It should be certainly possible to extend the Permissive graph to address these corner cases, but another factor I think we should consider is that the broadcast forwarding itself may not be necessary with the setLoopDomain-based approach. We should probably set the right loop domain from the beginning instead of trying to make different loop domains to be inlinable.

naoyam mentioned this pull request Sep 21, 2024

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

Merged

naoyam requested a review from zasdfgbnm September 21, 2024 20:47

zasdfgbnm reviewed Sep 23, 2024

View reviewed changes

zasdfgbnm approved these changes Sep 23, 2024

View reviewed changes

naoyam merged commit 3d82ae9 into main Sep 23, 2024
38 checks passed

naoyam deleted the initial_loop_domain branch September 23, 2024 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial_loop_domain_ to TensorDomain #2987

Add initial_loop_domain_ to TensorDomain #2987

naoyam commented Sep 21, 2024

naoyam commented Sep 21, 2024

csarofeen commented Sep 22, 2024

naoyam commented Sep 23, 2024

zasdfgbnm Sep 23, 2024

naoyam Sep 23, 2024

zasdfgbnm Sep 23, 2024

zasdfgbnm Sep 23, 2024 •

edited

Loading

naoyam Sep 23, 2024

Add initial_loop_domain_ to TensorDomain #2987

Add initial_loop_domain_ to TensorDomain #2987

Conversation

naoyam commented Sep 21, 2024

naoyam commented Sep 21, 2024

csarofeen commented Sep 22, 2024

naoyam commented Sep 23, 2024

zasdfgbnm Sep 23, 2024

Choose a reason for hiding this comment

naoyam Sep 23, 2024

Choose a reason for hiding this comment

zasdfgbnm Sep 23, 2024

Choose a reason for hiding this comment

zasdfgbnm Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

naoyam Sep 23, 2024

Choose a reason for hiding this comment

zasdfgbnm Sep 23, 2024 •

edited

Loading