-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MatmulParams::cluster_dims parameter #3574
Conversation
Following #3557 we can specify the cluster size for our fusions. Currently we don't do anything explicitly with CGAs, but this can help guarantee that tiles are scheduled onto GPCs in pairs. Each GPC has a number of TPCs, each of which holds 2 SMs, so this lets us take advantage of caching at the TPC and GPC level for operand loads, in addition to L2.
!test |
This reverts commit e3f611c.
!test |
tests/cpp/test_matmul.cpp
Outdated
@@ -3663,7 +3663,7 @@ TEST_F(HopperMatmulTest, HSH_NT_128BSwizzle) { | |||
const int64_t cta_m = 2 * getM(macro); | |||
const int64_t cta_n = 1 * getN(macro); | |||
|
|||
constexpr std::tuple<int64_t, int64_t, int64_t> cluster_dims{2, 1, 1}; | |||
constexpr std::tuple<int, int, int> cluster_dims{2, 1, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nitpick: can we stick with int64_t
for consistency?
constexpr std::tuple<int, int, int> cluster_dims{2, 1, 1}; | |
constexpr std::tuple<int64_t, int64_t, int64_t> cluster_dims{2, 1, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was mostly doing that because the MatmulParams
entries are int
, but we should probably just change MatmulParams
instead (in another PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Much smaller PR now...
!build |
Following #3557 we can specify the cluster size for our fusions. Currently we don't do anything explicitly with CGAs, but this can help guarantee that tiles are scheduled onto GPCs in pairs. Each GPC has a number of TPCs, each of which holds 2 SMs, so this lets us take advantage of caching at the TPC and GPC level for operand loads, in addition to L2.
This PR enables this with a default size of
{2, 1, 1}
for the Hopper scheduler. The parameter is ignored in the Ampere scheduler.It is not yet plumbed into the heuristic plugin API yet. I thought maybe we should wait until we have more parameters related to CGAs to do that.