-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983
Conversation
!build |
// could only change with setLoopDomain | ||
const bool may_need_forwarding = | ||
lower_utils::hasRootToLoopLinearTransformations(producer) && | ||
!ir_utils::compareDomains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: here, the order of compreDomains
needs to be loop
and logical
. It doesn't seem to work with logical and
loop`. I'll fix it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: #2985
!build |
a697732
to
3f4e06c
Compare
@@ -55,7 +62,7 @@ class MaxPosCalculator { | |||
size_t getMaxProducerPosFromConsumer( | |||
TensorView* producer, | |||
TensorView* consumer, | |||
bool best_effort) const; | |||
bool best_effort); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to drop const
as inliningGraph
lazily builds a graph
auto pairwise_logical_map = PairwiseLogicalDomainMap(producer, consumer); | ||
auto replay_CasP = BestEffortReplay::replayCasP( | ||
consumer, producer, -1, pairwise_logical_map); | ||
auto p2c_replay_map = replay_CasP.getReplay(); | ||
|
||
for (const auto producer_pos : c10::irange(producer->nDims())) { | ||
// If the producer position is mismatching with the consumer, then we can | ||
// not inline into this position, otherwise the max producer position of | ||
// the consumer will become invalid and expression sort will fail. | ||
if (TransformReplay::getMatchedLeafPosWithoutReplayCasP( | ||
consumer, producer, producer_pos + 1) < 0) { | ||
return producer_pos; | ||
} | ||
auto map_it = p2c_replay_map.find(producer->axis(producer_pos)); | ||
if (map_it != p2c_replay_map.end()) { | ||
auto c_id = map_it->second; | ||
if (!isAllowedID(c_id, consumer, best_effort, true, false, true)) { | ||
return producer_pos; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the same as before.
This is a follow-up to #2937, which allowed the loop domain of a tensor to have extra IDs (e.g., for inlining a 1D tensor to a 2D tensor). When working on #2983, I realized `TensorDomain::allIDs` has a problem of finding all IDs with extra loop IDs. As mentioned in the added code comment, the issue is primarily because `IRBFS::getExprsBetween` is asymmetric with respect to its two domain parameters. See the added test for a simple concrete example.
It turned out it's necessary to remember the initial loop domain, which is either the logical domain set by the constructor or the new loop domain set by setLoopDomain. When a loop domain has an extra ID, TensorDomain::allIDs may miss IDs that solely depend on the extra ID.
non-conventional loop domains
3f4e06c
to
89df21e
Compare
!build |
@@ -742,7 +742,7 @@ bool ComputeAtLogicalDomainMap::canMap( | |||
const TensorDomain* td_b, | |||
const IterDomain* id_b) const { | |||
NVF_ERROR( | |||
id_b->definition() == nullptr || id_b->isRFactorProduct(), | |||
td_b->isLogical(id_b) || td_b->isRoot(id_b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix. Related: #2961 (comment)
@@ -797,7 +797,8 @@ std::vector<TensorView*> getTVsWithDynamicTransform(Fusion* fusion) { | |||
CompareDomainResult compareDomains( | |||
std::vector<IterDomain*> dom0, | |||
const std::vector<IterDomain*>& dom1, | |||
const std::vector<IterDomain*>& additional_ids) { | |||
const std::vector<IterDomain*>& additional_ids, | |||
bool ignore_broadcast) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadcast IDs also need to be considered to detect extra IDs added by setLoopDomain
.
TensorView::updateMaxProducerPosition. Follow up to #2983
csrc/inlining.cpp
Outdated
// TODO: Consider caching these properties in TensorView as they | ||
// could only change with setLoopDomain | ||
const bool may_need_forwarding = | ||
lower_utils::hasRootToLoopLinearTransformations(producer) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this ir_utils
instead of lower_utils
? It's strange that inlining uses lower utils.
…rPosition. (#3003) Follow up to #2983. `TensorView::updateMaxProducerPosition` is used to set the max producer position when `inlineAt` is used. With this PR, `inlineMost` should work with `setLoopDomain`. This should be the last remaining piece to enable inlineMost with loop domains set by `setLoopDomain`. There are a couple more issues to address in lowering, though.
Part of #2902. See the linked design doc for the background context.
Stacked on #2984Stacked on #2987
The IdModel-based analysis is required for non-conventional loop domains as the dependency from the logical to loop domains may not be just one way from logical to loop. However, the IdModel-based analysis does not support broadcast forwarding, so we can't just completely switch to the new method.
For now, I think it makes most sense to use the existing method whenever logical and loop domains are generated in the conventional scheduling primitives such as split and merge. The new method is only used otherwise.
I also considered implementing the broadcast forwarding in the IdModel-based approach as well, but it doesn't seem to me worthwhile doing so at this moment. I thought maybe using the Permissive graph would make it trivial, but there seem to be (not a small number of) subtle corner cases of mappings that may not be considered in the Permissive graph, e.g., if the indexed domains should be mapped or not.
It should be certainly possible to extend the Permissive graph to address these corner cases, but another factor I think we should consider is that the broadcast forwarding itself may not be necessary with the setLoopDomain-based approach. We should probably set the right loop domain from the beginning instead of trying to make different loop domains to be inlinable.