Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend MaxPosCalculartor::getMaxProducerPosFromConsumer to support non-conventional loop domains #2983

Merged
merged 7 commits into from
Sep 26, 2024

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Sep 21, 2024

Part of #2902. See the linked design doc for the background context.

Stacked on #2984
Stacked on #2987

The IdModel-based analysis is required for non-conventional loop domains as the dependency from the logical to loop domains may not be just one way from logical to loop. However, the IdModel-based analysis does not support broadcast forwarding, so we can't just completely switch to the new method.

For now, I think it makes most sense to use the existing method whenever logical and loop domains are generated in the conventional scheduling primitives such as split and merge. The new method is only used otherwise.

I also considered implementing the broadcast forwarding in the IdModel-based approach as well, but it doesn't seem to me worthwhile doing so at this moment. I thought maybe using the Permissive graph would make it trivial, but there seem to be (not a small number of) subtle corner cases of mappings that may not be considered in the Permissive graph, e.g., if the indexed domains should be mapped or not.

It should be certainly possible to extend the Permissive graph to address these corner cases, but another factor I think we should consider is that the broadcast forwarding itself may not be necessary with the setLoopDomain-based approach. We should probably set the right loop domain from the beginning instead of trying to make different loop domains to be inlinable.

@naoyam
Copy link
Collaborator Author

naoyam commented Sep 21, 2024

!build

// could only change with setLoopDomain
const bool may_need_forwarding =
lower_utils::hasRootToLoopLinearTransformations(producer) &&
!ir_utils::compareDomains(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: here, the order of compreDomains needs to be loop and logical. It doesn't seem to work with logical and loop`. I'll fix it in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: #2985

@naoyam
Copy link
Collaborator Author

naoyam commented Sep 21, 2024

!build

@naoyam naoyam force-pushed the get_max_producer_pos branch from a697732 to 3f4e06c Compare September 21, 2024 02:37
@naoyam naoyam changed the base branch from main to all_ids_with_extra_loop_ids September 21, 2024 02:37
@@ -55,7 +62,7 @@ class MaxPosCalculator {
size_t getMaxProducerPosFromConsumer(
TensorView* producer,
TensorView* consumer,
bool best_effort) const;
bool best_effort);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to drop const as inliningGraph lazily builds a graph

@naoyam naoyam requested a review from zasdfgbnm September 21, 2024 02:44
Comment on lines +180 to +199
auto pairwise_logical_map = PairwiseLogicalDomainMap(producer, consumer);
auto replay_CasP = BestEffortReplay::replayCasP(
consumer, producer, -1, pairwise_logical_map);
auto p2c_replay_map = replay_CasP.getReplay();

for (const auto producer_pos : c10::irange(producer->nDims())) {
// If the producer position is mismatching with the consumer, then we can
// not inline into this position, otherwise the max producer position of
// the consumer will become invalid and expression sort will fail.
if (TransformReplay::getMatchedLeafPosWithoutReplayCasP(
consumer, producer, producer_pos + 1) < 0) {
return producer_pos;
}
auto map_it = p2c_replay_map.find(producer->axis(producer_pos));
if (map_it != p2c_replay_map.end()) {
auto c_id = map_it->second;
if (!isAllowedID(c_id, consumer, best_effort, true, false, true)) {
return producer_pos;
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is the same as before.

naoyam added a commit that referenced this pull request Sep 21, 2024
This is a follow-up to #2937, which allowed the loop domain of a tensor
to have extra IDs (e.g., for inlining a 1D tensor to a 2D tensor).

When working on #2983, I realized `TensorDomain::allIDs` has a problem
of finding all IDs with extra loop IDs. As mentioned in the added code
comment, the issue is primarily because `IRBFS::getExprsBetween` is
asymmetric with respect to its two domain parameters. See the added test
for a simple concrete example.
Base automatically changed from all_ids_with_extra_loop_ids to main September 21, 2024 04:25
@naoyam naoyam marked this pull request as draft September 21, 2024 17:33
@naoyam naoyam removed the request for review from zasdfgbnm September 21, 2024 17:33
It turned out it's necessary to remember the initial loop domain, which
is either the logical domain set by the constructor or the new loop
domain set by setLoopDomain. When a loop domain has an extra ID,
TensorDomain::allIDs may miss IDs that solely depend on the extra ID.
@naoyam naoyam force-pushed the get_max_producer_pos branch from 3f4e06c to 89df21e Compare September 21, 2024 20:37
@naoyam naoyam changed the base branch from main to initial_loop_domain September 21, 2024 20:37
@naoyam
Copy link
Collaborator Author

naoyam commented Sep 21, 2024

!build

@naoyam naoyam marked this pull request as ready for review September 21, 2024 20:39
Base automatically changed from initial_loop_domain to main September 23, 2024 23:41
@@ -742,7 +742,7 @@ bool ComputeAtLogicalDomainMap::canMap(
const TensorDomain* td_b,
const IterDomain* id_b) const {
NVF_ERROR(
id_b->definition() == nullptr || id_b->isRFactorProduct(),
td_b->isLogical(id_b) || td_b->isRoot(id_b),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix. Related: #2961 (comment)

@@ -797,7 +797,8 @@ std::vector<TensorView*> getTVsWithDynamicTransform(Fusion* fusion) {
CompareDomainResult compareDomains(
std::vector<IterDomain*> dom0,
const std::vector<IterDomain*>& dom1,
const std::vector<IterDomain*>& additional_ids) {
const std::vector<IterDomain*>& additional_ids,
bool ignore_broadcast) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadcast IDs also need to be considered to detect extra IDs added by setLoopDomain.

naoyam added a commit that referenced this pull request Sep 24, 2024
TensorView::updateMaxProducerPosition.

Follow up to #2983
// TODO: Consider caching these properties in TensorView as they
// could only change with setLoopDomain
const bool may_need_forwarding =
lower_utils::hasRootToLoopLinearTransformations(producer) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this ir_utils instead of lower_utils? It's strange that inlining uses lower utils.

@naoyam naoyam merged commit 3b9a7e3 into main Sep 26, 2024
5 checks passed
@naoyam naoyam deleted the get_max_producer_pos branch September 26, 2024 01:47
naoyam added a commit that referenced this pull request Sep 26, 2024
…rPosition. (#3003)

Follow up to #2983.

`TensorView::updateMaxProducerPosition` is used to set the max producer
position when `inlineAt` is used. With this PR, `inlineMost` should work
with `setLoopDomain`.

This should be the last remaining piece to enable inlineMost with loop
domains set by `setLoopDomain`. There are a couple more issues to
address in lowering, though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants