Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

naoyam · 2024-09-21T00:33:11Z

Part of #2902. See the linked design doc for the background context.

~~Stacked on #2984~~
Stacked on #2987

The IdModel-based analysis is required for non-conventional loop domains as the dependency from the logical to loop domains may not be just one way from logical to loop. However, the IdModel-based analysis does not support broadcast forwarding, so we can't just completely switch to the new method.

For now, I think it makes most sense to use the existing method whenever logical and loop domains are generated in the conventional scheduling primitives such as split and merge. The new method is only used otherwise.

I also considered implementing the broadcast forwarding in the IdModel-based approach as well, but it doesn't seem to me worthwhile doing so at this moment. I thought maybe using the Permissive graph would make it trivial, but there seem to be (not a small number of) subtle corner cases of mappings that may not be considered in the Permissive graph, e.g., if the indexed domains should be mapped or not.

It should be certainly possible to extend the Permissive graph to address these corner cases, but another factor I think we should consider is that the broadcast forwarding itself may not be necessary with the setLoopDomain-based approach. We should probably set the right loop domain from the beginning instead of trying to make different loop domains to be inlinable.

naoyam · 2024-09-21T00:33:24Z

!build

naoyam · 2024-09-21T00:51:30Z

csrc/inlining.cpp

+  // could only change with setLoopDomain
+  const bool may_need_forwarding =
+      lower_utils::hasRootToLoopLinearTransformations(producer) &&
+      !ir_utils::compareDomains(


side note: here, the order of compreDomains needs to be loop and logical. It doesn't seem to work with logical and loop`. I'll fix it in a separate PR.

Issue: #2985

naoyam · 2024-09-21T00:58:10Z

!build

naoyam · 2024-09-21T02:38:55Z

csrc/inlining.h

@@ -55,7 +62,7 @@ class MaxPosCalculator {
  size_t getMaxProducerPosFromConsumer(
      TensorView* producer,
      TensorView* consumer,
-      bool best_effort) const;
+      bool best_effort);


Needed to drop const as inliningGraph lazily builds a graph

naoyam · 2024-09-21T03:11:07Z

csrc/inlining.cpp

+    auto pairwise_logical_map = PairwiseLogicalDomainMap(producer, consumer);
+    auto replay_CasP = BestEffortReplay::replayCasP(
+        consumer, producer, -1, pairwise_logical_map);
+    auto p2c_replay_map = replay_CasP.getReplay();
+
+    for (const auto producer_pos : c10::irange(producer->nDims())) {
+      // If the producer position is mismatching with the consumer, then we can
+      // not inline into this position, otherwise the max producer position of
+      // the consumer will become invalid and expression sort will fail.
+      if (TransformReplay::getMatchedLeafPosWithoutReplayCasP(
+              consumer, producer, producer_pos + 1) < 0) {
+        return producer_pos;
+      }
+      auto map_it = p2c_replay_map.find(producer->axis(producer_pos));
+      if (map_it != p2c_replay_map.end()) {
+        auto c_id = map_it->second;
+        if (!isAllowedID(c_id, consumer, best_effort, true, false, true)) {
+          return producer_pos;
+        }
+      }


This part is the same as before.

This is a follow-up to #2937, which allowed the loop domain of a tensor to have extra IDs (e.g., for inlining a 1D tensor to a 2D tensor). When working on #2983, I realized `TensorDomain::allIDs` has a problem of finding all IDs with extra loop IDs. As mentioned in the added code comment, the issue is primarily because `IRBFS::getExprsBetween` is asymmetric with respect to its two domain parameters. See the added test for a simple concrete example.

It turned out it's necessary to remember the initial loop domain, which is either the logical domain set by the constructor or the new loop domain set by setLoopDomain. When a loop domain has an extra ID, TensorDomain::allIDs may miss IDs that solely depend on the extra ID.

non-conventional loop domains

naoyam · 2024-09-21T20:39:05Z

!build

naoyam · 2024-09-23T23:44:20Z

csrc/logical_domain_map.cpp

@@ -742,7 +742,7 @@ bool ComputeAtLogicalDomainMap::canMap(
    const TensorDomain* td_b,
    const IterDomain* id_b) const {
  NVF_ERROR(
-      id_b->definition() == nullptr || id_b->isRFactorProduct(),
+      td_b->isLogical(id_b) || td_b->isRoot(id_b),


Bug fix. Related: #2961 (comment)

naoyam · 2024-09-23T23:48:13Z

csrc/ir/utils.cpp

@@ -797,7 +797,8 @@ std::vector<TensorView*> getTVsWithDynamicTransform(Fusion* fusion) {
 CompareDomainResult compareDomains(
    std::vector<IterDomain*> dom0,
    const std::vector<IterDomain*>& dom1,
-    const std::vector<IterDomain*>& additional_ids) {
+    const std::vector<IterDomain*>& additional_ids,
+    bool ignore_broadcast) {


Broadcast IDs also need to be considered to detect extra IDs added by setLoopDomain.

TensorView::updateMaxProducerPosition. Follow up to #2983

zasdfgbnm · 2024-09-25T21:33:37Z

csrc/inlining.cpp

+  // TODO: Consider caching these properties in TensorView as they
+  // could only change with setLoopDomain
+  const bool may_need_forwarding =
+      lower_utils::hasRootToLoopLinearTransformations(producer) &&


Should we make this ir_utils instead of lower_utils? It's strange that inlining uses lower utils.

…rPosition. (#3003) Follow up to #2983. `TensorView::updateMaxProducerPosition` is used to set the max producer position when `inlineAt` is used. With this PR, `inlineMost` should work with `setLoopDomain`. This should be the last remaining piece to enable inlineMost with loop domains set by `setLoopDomain`. There are a couple more issues to address in lowering, though.

naoyam commented Sep 21, 2024

View reviewed changes

naoyam mentioned this pull request Sep 21, 2024

Fix TensorDomain::allIDs when loop domain has extra IDs #2984

Merged

naoyam force-pushed the get_max_producer_pos branch from a697732 to 3f4e06c Compare September 21, 2024 02:37

naoyam changed the base branch from main to all_ids_with_extra_loop_ids September 21, 2024 02:37

naoyam commented Sep 21, 2024

View reviewed changes

naoyam requested a review from zasdfgbnm September 21, 2024 02:44

naoyam commented Sep 21, 2024

View reviewed changes

Base automatically changed from all_ids_with_extra_loop_ids to main September 21, 2024 04:25

naoyam marked this pull request as draft September 21, 2024 17:33

naoyam removed the request for review from zasdfgbnm September 21, 2024 17:33

naoyam added 4 commits September 21, 2024 13:19

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support

0298c97

non-conventional loop domains

remove debug print

3ebe834

Add another test

89df21e

naoyam force-pushed the get_max_producer_pos branch from 3f4e06c to 89df21e Compare September 21, 2024 20:37

naoyam changed the base branch from main to initial_loop_domain September 21, 2024 20:37

naoyam marked this pull request as ready for review September 21, 2024 20:39

fix may_need_forwarding

1a82283

Base automatically changed from initial_loop_domain to main September 23, 2024 23:41

Merge branch 'main' into get_max_producer_pos

e60c4ba

naoyam commented Sep 23, 2024

View reviewed changes

naoyam added a commit that referenced this pull request Sep 24, 2024

Support non-conventional loop domains in

8c02a0a

TensorView::updateMaxProducerPosition. Follow up to #2983

naoyam mentioned this pull request Sep 24, 2024

Support non-conventional loop domains in TensorView::updateMaxProducerPosition. #3003

Merged

zasdfgbnm approved these changes Sep 25, 2024

View reviewed changes

zasdfgbnm reviewed Sep 25, 2024

View reviewed changes

move hasRootToLoopLinearTransformations

0dab59b

naoyam merged commit 3b9a7e3 into main Sep 26, 2024
5 checks passed

naoyam deleted the get_max_producer_pos branch September 26, 2024 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

naoyam commented Sep 21, 2024 •

edited

Loading

naoyam commented Sep 21, 2024

naoyam Sep 21, 2024

naoyam Sep 21, 2024

naoyam commented Sep 21, 2024

naoyam Sep 21, 2024

naoyam Sep 21, 2024

naoyam commented Sep 21, 2024

naoyam Sep 23, 2024

naoyam Sep 23, 2024

zasdfgbnm Sep 25, 2024

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

Conversation

naoyam commented Sep 21, 2024 • edited Loading

naoyam commented Sep 21, 2024

naoyam Sep 21, 2024

Choose a reason for hiding this comment

naoyam Sep 21, 2024

Choose a reason for hiding this comment

naoyam commented Sep 21, 2024

naoyam Sep 21, 2024

Choose a reason for hiding this comment

naoyam Sep 21, 2024

Choose a reason for hiding this comment

naoyam commented Sep 21, 2024

naoyam Sep 23, 2024

Choose a reason for hiding this comment

naoyam Sep 23, 2024

Choose a reason for hiding this comment

zasdfgbnm Sep 25, 2024

Choose a reason for hiding this comment

naoyam commented Sep 21, 2024 •

edited

Loading