Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule epilogue (for Hopper Matmul) by propagation backward from output - smem epilogue not supported. #3580

Merged
merged 6 commits into from
Dec 14, 2024

Conversation

protonu
Copy link
Collaborator

@protonu protonu commented Dec 12, 2024

This adds support for scheduling epilogue for the hopper matmul scheduler.
We don't support smem epilogue as yet.

We also don't honor the vectorization_factor as yet for the store to output. That'll be covered in a separate PR.

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only comments are around code organization.

csrc/scheduler/hopper_multi_matmul.cpp Show resolved Hide resolved
Comment on lines 428 to 434
// Apply mma common transformation
c->split(-2, getM(params_->mma_macro));
c->split(-1, getN(params_->mma_macro));
// [..., Mo, No, Mio, Mii, Nio, Nii]
// -> [..., Mo, No, Mio, Nio, Mii, Nii]
c->reorder({{-3, -2}});
c->merge(-4);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this pattern a lot. Can we make this a utility method in this class and give it an informative name?

c->reorder({{-3, -2}});
c->merge(-4);

// [...., Mii, Nii] ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little hard for me to follow since there are some intermediate splits that are not shown. For example we start with Mii then get Miii(8) and Mii/16, so is that an (Mii/2)/8 (I think so)?
Since this is a complicated function maybe it is best to just skip to the last line (line 439) that shows the final sizes, and maybe give them different names.

Comment on lines 442 to 445
auto s =
mma_utils::MmaSwizzler::scheduleMmaOutputAllocation(c->getLoopDomain());
c->setLoopDomain(s.as<IterDomain*>());
c->axis(-5)->parallelize(ParallelType::TIDy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned in #3575 (comment) that this whole chunk should basically be a new method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for @rdspring1 to land his PR then use his fix or add one.

tests/cpp/test_matmul_scheduler.cpp Show resolved Hide resolved
tests/cpp/test_matmul_scheduler.cpp Show resolved Hide resolved
Copy link
Collaborator

@rdspring1 rdspring1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait to review until you pull the latests changes but the PR looks good to me. 🎉

@protonu protonu force-pushed the pbasu_mma_epilogue_hopper_no_smem_epilogue branch from 94f39a1 to be5f18b Compare December 13, 2024 18:41
@protonu
Copy link
Collaborator Author

protonu commented Dec 13, 2024

!build

2,
{c},
{ParallelType::BIDx, ParallelType::BIDy, ParallelType::BIDz});
// Propato to (not including) the splitk output if there is a splitk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Propato to (not including) the splitk output if there is a splitk
// Propagate to (not including) the splitk output if there is a splitk

Copy link
Collaborator

@rdspring1 rdspring1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just be aware of #3590. The current use_smem_epilogue epilogue will have conflicts. Your changes should be fine.

tests/cpp/test_matmul_scheduler.cpp Show resolved Hide resolved
csrc/scheduler/hopper_multi_matmul.cpp Show resolved Hide resolved
scheduler_utils::BoundedDirectionalTransformPropagator::backward(
d,
-1,
mma_results_,
propagate_to,
scheduler_utils::BoundedDirectionalTransformPropagator::Options()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: we'll need to revisit this path for cases where TMA is disabled. We need to make sure it is doing coalesced stores here (can check with ncu). I imagine that we will almost always want to use TMA, but it's possible that we would skip it if it increases smem usage too much. In those cases we'll hit this path.

csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this gets us close to full functionality. BTW since smem epilogue is supported please change the title and PR description.

csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved
@protonu
Copy link
Collaborator Author

protonu commented Dec 13, 2024

LGTM. I think this gets us close to full functionality. BTW since smem epilogue is supported please change the title and PR description.

Not as yet - this PR doesn't support smem epilogue.

@jacobhinkle
Copy link
Collaborator

LGTM. I think this gets us close to full functionality. BTW since smem epilogue is supported please change the title and PR description.

Not as yet - this PR doesn't support smem epilogue.

Oh OK nevermind then.

@protonu protonu force-pushed the pbasu_mma_epilogue_hopper_no_smem_epilogue branch from 895df0f to 09cb407 Compare December 14, 2024 01:42
@protonu
Copy link
Collaborator Author

protonu commented Dec 14, 2024

!test

@protonu protonu merged commit d53be45 into main Dec 14, 2024
48 checks passed
@protonu protonu deleted the pbasu_mma_epilogue_hopper_no_smem_epilogue branch December 14, 2024 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants