Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow output to alias intermediate tensor, step-2 to solve group norm segmentation issue #2375 #2405

Merged
merged 31 commits into from
Jul 19, 2024

Conversation

liqiangxl
Copy link
Collaborator

@liqiangxl liqiangxl commented Jun 13, 2024

This PR is step-2 to solve #2375
Allow output to alias intermediate when:
(1) called from pre segment pass
(2) the op that preduces output interfer with reduction, see outputInterferingReduction

After this PR, the last reshape (which produces the ouput tensor) in group norm is changed to a no-op, see newly added test OutputAliasIntermediate

Util function getRepresentativeReductionTv is added for convenience and will be used to simipify reduction/normalization schedulers.

@liqiangxl liqiangxl changed the title Llu/group norm step1 alias Allow output to alias intermediate tensor, step-2 to solve group norm segmentation issue #2375 Jul 15, 2024
@liqiangxl
Copy link
Collaborator Author

!build

@liqiangxl liqiangxl requested a review from wujingyue July 15, 2024 22:32
@liqiangxl liqiangxl marked this pull request as ready for review July 15, 2024 22:32
// If allow output to alias intermediate, then we don't need to walk up
// the chain to find an input or output root. It changes the last reshape
// in group norm to a no-op.
if (!allow_output_alias_intermediate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Have you read #2395 (comment)? I tagged you there but the notification apparently didn't land. Would that solve this groupnorm issue?

Copy link
Collaborator Author

@liqiangxl liqiangxl Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and it looks great. IIUC, step-1 in the proposal is Run alias analysis as we do today, however, current AliasAnalysisResult::finalize needs to walk up the chain to find an input or output root and it will miss the case in group norm where the output alias an intermediate tensor. That's why I need to add this patch.
Previously, I also need to patch MarkAliasesPreparePass::runPass (see commit ), but with #2529, that patch is no longer required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AliasAnalysisResult::finalize needs to walk up the chain to find an input or output root and it will miss the case in group norm where the output alias an intermediate tensor.

Yes, that's an important implementation detail that needs to be reworked. alias_to_source_ does keep aliases between intermediate tensors. So I'll have to put segment_set based on that without having to find the fusion I/O.

Overall, I feel lots of this PR is going to be superseded. However, if you need it in a week, I don't mind reviewing it. Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superseded

AliasAnalysisResult::finalize needs to walk up the chain to find an input or output root and it will miss the case in group norm where the output alias an intermediate tensor.

Yes, that's an important implementation detail that needs to be reworked. alias_to_source_ does keep aliases between intermediate tensors. So I'll have to put segment_set based on that without having to find the fusion I/O.

Overall, I feel lots of this PR is going to be superseded. However, if you need it in a week, I don't mind reviewing it. Wdyt?

Thanks. It would be great to have this PR merged, so we can make group norm works as expected. Assuming it won't cause troubles for your planned works to revise alias analysis.

@liqiangxl
Copy link
Collaborator Author

!build

csrc/alias_analysis.h Outdated Show resolved Hide resolved
csrc/scheduler/no_op.cpp Outdated Show resolved Hide resolved
Comment on lines 439 to 440
// If allow output to alias intermediate, then we don't need to walk up
// the chain to find an input or output root. It changes the last reshape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope it's backed by numbers, but a more reasonable approach seems to be walk up until we reach a tensor that are known to be global, e.g., when outputInterferingReduction returns true or, not in this PR, the input of slice or pad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confusion about a tensor that are known to be global. Here I am trying to let the output tensor alias to a tensor that is not in global memory (not input or output, but intermediate), then further code will segment this output tensor = someOp(an intermediate tensor) to a no-op.

Here outputInterferingReduction depends on the whole fusion not a single tensor. So I didn't move it into the walk up process. But it looks like we should also check alias in alias_to_source_ is actually defined by a view op. Then we don't need to walk up. So the code is changed as:
(1) stop_at_view = may_alias_intermediate && outputInterferingReduction(fusion) is moved to finalize()
(2) only walk up if if (!isOpsToStop(alias->definition(), stop_at_view))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As discussed offline, it may make more sense to walk up until you find a non-view. E.g., in ... -> t0 -> (View) -> t1 -> (View) -> out, it's better to segment at t0 instead of t1.

tests/cpp/test_gpu_view.cpp Outdated Show resolved Hide resolved
tests/cpp/test_gpu_view.cpp Outdated Show resolved Hide resolved
@liqiangxl
Copy link
Collaborator Author

!build

@liqiangxl
Copy link
Collaborator Author

!build

@wujingyue wujingyue self-requested a review July 19, 2024 05:18
@liqiangxl liqiangxl merged commit 15bdf9f into main Jul 19, 2024
5 checks passed
@liqiangxl liqiangxl deleted the llu/group_norm_step1_alias branch July 19, 2024 17:12
wujingyue added a commit that referenced this pull request Jul 19, 2024
wujingyue added a commit that referenced this pull request Jul 21, 2024
wujingyue added a commit that referenced this pull request Jul 21, 2024
wujingyue added a commit that referenced this pull request Jul 21, 2024
wujingyue added a commit that referenced this pull request Jul 21, 2024
wujingyue added a commit that referenced this pull request Jul 21, 2024
wujingyue added a commit that referenced this pull request Jul 26, 2024
wujingyue added a commit that referenced this pull request Jul 26, 2024
wujingyue added a commit that referenced this pull request Jul 27, 2024
wujingyue added a commit that referenced this pull request Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants