-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial_loop_domain_ to TensorDomain #2987
Conversation
It turned out it's necessary to remember the initial loop domain, which is either the logical domain set by the constructor or the new loop domain set by setLoopDomain. When a loop domain has an extra ID, TensorDomain::allIDs may miss IDs that solely depend on the extra ID.
!build |
When does that happen? |
I'm still prototyping, but, for example:
The producer-based scheduling would use the loop domain of |
@@ -3614,6 +3617,7 @@ void TensorDomain::setLoopDomain(std::vector<IterDomain*> new_loop_domain) { | |||
". Logical: ", | |||
toDelimitedString(logical_domain_)); | |||
loop_domain_ = std::move(new_loop_domain); | |||
initial_loop_domain_ = loop_domain_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just append everything in loop_domain_
here to additional_ids_
? In allIDs
, we might want to do:
for (auto i : c10::irange(all_domains.size() - 1)) {
for (auto j : c10::irange(all_domains.size() - 1)) {
instead of
for (auto i : c10::irange(all_domains.size() - 1)) {
for (auto j : c10::irange(i + 1, all_domains.size())) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about it, but since additional_ids_
are just "additional", it seems to me it could be useful in some cases. For example, we could cache all IDs between logical
and initial_loop
to speed up TensorDomain::allIDs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional_ids_
was the mechanism designed to allow loop domain to have a new broadcast, and the reason why it exists was to solve almost exactly the same problem as this PR solves. Moving forward, looks like we will not continue with the "allow loop domain to have a new broadcast" direction. But I don't want to have two mechanisms for the same thing. We should pick either of them, but only one should be picked. It makes sense to speed up TensorDomain::allIDs
, and it might also make sense to call the data structure we use for the speedup additional_ids_
, but it will be a new thing that is totally unrelated to today's additional_ids_
except they happen to have the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which mechanism to pick I believe is mostly a question of interface design. Today, we have:
void TensorDomain::broadcast(int64_t axis, Val* extent) {
axis = nvfuser::wrapDim(axis, nDims() + 1);
IterDomain* id = IterDomainBuilder(fusion()->zeroVal(), extent)
.iter_type(IterType::Broadcast)
.build();
loop_domain_.insert(loop_domain_.begin() + axis, id);
additional_ids_.push_back(id);
}
That creates a new broadcast on the loop domain. But if we just do setLoopDomain(something with manually created new broadcasts)
, we will fail.
And in this PR, we are not using an interface like TensorDomain::broadcast
, so it will not have the problem of the above mechanism. But if we just do a setLoopDomain(getLoopDomain())
, we will fail.
In summary, I believe neither the additional_ids_
mechanism nor the one in this PR is safe. The additional_ids_
is more convenient for the TensorDomain::broadcast
-like interface, that is, we have a method function that says "add this new ID to this axis". And the one in this PR is more convenient for setLoopDomain
way of inserting new IDs.
I think my question for now is: do we want to pause a bit to think about interface design, or do we want to just quickly unblock for now and do some cleanup later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our offline discussion, we decided that it's more important to move forward and get some progress before too much worrying about interface designs. See also #3000
…n-conventional loop domains (#2983) Part of #2902. See the linked design doc for the background context. ~Stacked on #2984~ Stacked on #2987 The IdModel-based analysis is required for non-conventional loop domains as the dependency from the logical to loop domains may not be just one way from logical to loop. However, the IdModel-based analysis does not support broadcast forwarding, so we can't just completely switch to the new method. For now, I think it makes most sense to use the existing method whenever logical and loop domains are generated in the conventional scheduling primitives such as split and merge. The new method is only used otherwise. I also considered implementing the broadcast forwarding in the IdModel-based approach as well, but it doesn't seem to me worthwhile doing so at this moment. I thought maybe using the Permissive graph would make it trivial, but there seem to be (not a small number of) subtle corner cases of mappings that may not be considered in the Permissive graph, e.g., if the indexed domains should be mapped or not. It should be certainly possible to extend the Permissive graph to address these corner cases, but another factor I think we should consider is that the broadcast forwarding itself may not be necessary with the setLoopDomain-based approach. We should probably set the right loop domain from the beginning instead of trying to make different loop domains to be inlinable.
It turned out it's necessary to remember the initial loop domain, which is either the logical domain set by the constructor or the new loop domain set by setLoopDomain. When a loop domain has an extra ID, TensorDomain::allIDs may miss IDs that solely depend on the extra ID.