Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pointwise scheduler fails to validate reference tv #3513

Open
wants to merge 72 commits into
base: main
Choose a base branch
from

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Dec 2, 2024

Fixes: #3512

When picking reference tv, pointwise scheduler fails to validate that the transformation on reference tv can be safely propagated to all outputs in the fusion. The issue occurs when an IterDomain that's not in the reference tv is merged with another dimension in the output tv, preventing the merge on reference tv to be propagated to the target.

This PR adds an optional check areAllOutputIdsMappedTo in nvfuser::pointwise_utils::DomainMap::isValidReference

The added check in this PR checks that all source producer IterDomain producing the IterDomain on outputs are covered by reference tv. This is safe for pointwise scheduler, since the scheduler checks that there's no reversible view present in the fusion.

The check is optional and is disabled by transpose scheduler, where the reference_tv is not supposed to cover the entire fusion, but rather a subset of fusion IO tensors. We should extent that in future PRs.

@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123 jjsjann123 changed the title Pw scheduler reference find patch [SMOKE TEST] Pw scheduler reference find patch Dec 2, 2024
@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123
Copy link
Collaborator Author

🤞

@jjsjann123 jjsjann123 changed the title [SMOKE TEST] Pw scheduler reference find patch pointwise scheduler fails to validate reference tv Dec 5, 2024
@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123 jjsjann123 requested a review from naoyam December 5, 2024 10:37
@jjsjann123
Copy link
Collaborator Author

Accidentally hit this old issue again when I was playing with slice. #2514 (comment)
I don't think that one should count towards reference tv selection.

@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

@naoyam
Copy link
Collaborator

naoyam commented Dec 12, 2024

But now I'm worried that this check might not work for pad, where we go from a broadcast ID to non-broadcast ID... I'm checking that with a test.

Ah, this actually can cause an issue. For example, suppose we pick a tensor as a reference that has a broadcast ID, and that broadcast ID comes from a fusion input tensor. Suppose that broadcast ID is also used by a pad op, generating a non-broadcast ID, and that non-broadcast ID is NOT included in the reference.

More specifically:

t0 = [i0, b1] // fusion input
t1 = [i2, i3, i4, b5] // fusion input

// t1: [b6, b7, i0, b1]
t2 = broadcast(t0, [true, true, false, false]) 

t3 = add(t1, t2) // fusion output


// t4: [i0, i8]
t4 = pad(t0) // fusion output

Here, suppose we choose t3 as a reference. Clearly, we can't schedule the i8 domain of t4. However, since the propagation back from i8 reaches at b1, which is also reachable from t3, the reference validation would miss flagging the invalid reference.

@jjsjann123
Copy link
Collaborator Author

jjsjann123 commented Dec 12, 2024

Clearly, we can't schedule the i8 domain of t4.

I was worried about the same thing and I was thinking about changing the get_source_iter_domains here to add output of resize to also be treated as sources.

But turns out we don't need that.
Transform propagation can actually schedule fusion like that from t3.

You can look at the other example I added for pad (there's a typo in the comment on tv0, I'll fix that). It's very similar to yours
I also just tried you example and the scheduling seems to work. I'll add it to the test as well.

I think the difference between this example and the original issue we had is due to resize. Which links the input ID to the output ID, so propagation of transformation works (I think?!) I was planning to ask you that question tomorrow, since it's getting late when I ran into this.

@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

auto fusion_ptr = std::make_unique<Fusion>();
auto fusion = fusion_ptr.get();
FusionGuard fg(fusion);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this is the example you brought up in comment @naoyam

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... the definition isn't right....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, something looks off to me. For example, padding seems to be done with i0 as its input, so this doesn't seem like a good example of padding a broadcast ID.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the order of the padding widths is from inner to outer, just like the PyTorch pad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I got confused. I think your example looks like mine where the broadcast IDs mapped between the two operator.

The example i had here is actually slightly different and it doesn't map. Let me play with this one a bit more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so after I fix the example. Turns out I'm causing a regression now -> the added example can't be scheduled as a single fusion; while in ToT the example compiles and runs as a single fusion.

I'm trying a small refactor to avoid that.

Playing with slice/pad gave me slightly more confident in our transform propagation. 😄

// -> concrete map to i1
// So T3 is contained by T2. See test `PointwiseTest.DomainMapPad1`
auto concrete_source_id_out =
ca_map_.getConcreteMappedID(source_id_out, IdMappingMode::PERMISSIVE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change I made in order to avoid the regression in the added test. PointwiseTest.DomainMapPad1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this did work with our tests.

@naoyam Let me know what you think about this change.

@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

@jjsjann123
Copy link
Collaborator Author

errr... what's with CI 😭

@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

@jjsjann123 jjsjann123 requested a review from naoyam December 12, 2024 17:46
@jjsjann123
Copy link
Collaborator Author

jjsjann123 commented Dec 12, 2024

My gut feeling, based on my naive understanding about transform propagation, if we actually have an expandOp, or even just use a resize to link the IDs that are expanded, I feel nvfuser might be able to codegen the original issue #3512 without a problem.

The other scenario is also pretty interesting to me. #3576

NOTE for myself. I think I should go have a look at how transform propagation actually works to verify this for a piece of mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pointwise scheduler picks invalid reference TensorView
4 participants