Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][DO NOT REVIEW] Schedule epilogue for Hopper - without smem epilogue #3565

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

protonu
Copy link
Collaborator

@protonu protonu commented Dec 11, 2024

No description provided.

d->axis(-1)->parallelize(ParallelType::Vectorize);
}
scheduleFusionInputsForEpilogue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this path, since we are not using TMA we should respect the params_->supported_vec_size.epilogue parameter to set the vectorization width. See scheduleOutputTensor in this file, which we should either update and use or remove.

In the TMA path (the else branch) we might not be able to use scheduleFusionInputsForEpilogue as is because it is currently propagating the schedule back from the output d, which is scheduled for a TMA store not a vectorized load. We could potentially just do that propagation then check the innermost dims and merge/split/unroll as needed in order to hit the target vectorization for the epilogue.

@protonu protonu force-pushed the pbasu_wip_mma_epilogue branch 2 times, most recently from d4e9ae6 to 75c9a86 Compare December 11, 2024 19:21
@protonu protonu force-pushed the pbasu_wip_mma_epilogue branch from 75c9a86 to 99aace1 Compare December 11, 2024 21:57
@@ -1310,6 +1310,77 @@ TEST_F(MatmulSchedulerTest, EpilogueAlpha) {
NVF_CHECK(outputs[0].allclose(tref, 0.001, 0.001));
}

TEST_F(MatmulSchedulerTest, MatmulHopperCastBiasCast) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now but since this test is not Hopper-specific iin the future (once #3440 is trimmed and merged) we could probably remove it and just unguard some other tests like the MatmulOp and LinearOp translation tests which use the same fusion but with a matmul instead of fusedMultiplySum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants