-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pointwise scheduler fails to validate reference tv #3513
base: main
Are you sure you want to change the base?
Conversation
!test |
This reverts commit 7333806.
!test |
🤞 |
!test |
Accidentally hit this old issue again when I was playing with slice. #2514 (comment) |
!test --diff-bench |
Ah, this actually can cause an issue. For example, suppose we pick a tensor as a reference that has a broadcast ID, and that broadcast ID comes from a fusion input tensor. Suppose that broadcast ID is also used by a pad op, generating a non-broadcast ID, and that non-broadcast ID is NOT included in the reference. More specifically:
Here, suppose we choose |
I was worried about the same thing and I was thinking about changing the But turns out we don't need that. You can look at the other example I added for pad (there's a typo in the comment on tv0, I'll fix that). It's very similar to yours I think the difference between this example and the original issue we had is due to |
!test --diff-bench |
auto fusion_ptr = std::make_unique<Fusion>(); | ||
auto fusion = fusion_ptr.get(); | ||
FusionGuard fg(fusion); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, this is the example you brought up in comment @naoyam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait... the definition isn't right....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, something looks off to me. For example, padding seems to be done with i0
as its input, so this doesn't seem like a good example of padding a broadcast ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the order of the padding widths is from inner to outer, just like the PyTorch pad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I got confused. I think your example looks like mine where the broadcast IDs mapped between the two operator.
The example i had here is actually slightly different and it doesn't map. Let me play with this one a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so after I fix the example. Turns out I'm causing a regression now -> the added example can't be scheduled as a single fusion; while in ToT the example compiles and runs as a single fusion.
I'm trying a small refactor to avoid that.
Playing with slice/pad gave me slightly more confident in our transform propagation. 😄
// -> concrete map to i1 | ||
// So T3 is contained by T2. See test `PointwiseTest.DomainMapPad1` | ||
auto concrete_source_id_out = | ||
ca_map_.getConcreteMappedID(source_id_out, IdMappingMode::PERMISSIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change I made in order to avoid the regression in the added test. PointwiseTest.DomainMapPad1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this did work with our tests.
@naoyam Let me know what you think about this change.
!test --diff-bench |
errr... what's with CI 😭 |
!test --diff-bench |
My gut feeling, based on my naive understanding about transform propagation, if we actually have an The other scenario is also pretty interesting to me. #3576 NOTE for myself. I think I should go have a look at how transform propagation actually works to verify this for a piece of mind. |
Fixes: #3512
When picking reference tv, pointwise scheduler fails to validate that the transformation on reference tv can be safely propagated to all outputs in the fusion. The issue occurs when an IterDomain that's not in the reference tv is merged with another dimension in the output tv, preventing the merge on reference tv to be propagated to the target.
This PR adds an optional check
areAllOutputIdsMappedTo
innvfuser::pointwise_utils::DomainMap::isValidReference
The added check in this PR checks that all source producer IterDomain producing the IterDomain on outputs are covered by reference tv. This is safe for pointwise scheduler, since the scheduler checks that there's no reversible view present in the fusion.
The check is optional and is disabled by transpose scheduler, where the reference_tv is not supposed to cover the entire fusion, but rather a subset of fusion IO tensors. We should extent that in future PRs.