Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MatmulParams::CircularBufferOptions::smem_circular_buffer_prefetch_gap #3558

Merged
merged 5 commits into from
Dec 12, 2024

Conversation

jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Dec 10, 2024

Previously we had hardcoded to use smem_circular_buffer_stage - 1 but this gives us more flexibility in our heuristic. We will use smem_circular_buffer_stage - smem_circular_buffer_prefetch_gap as the prefetch distance. Parametrizing this way lets us modify the number of stages without needing to adjust the prefetch gap each time, since it will most commonly just be 1.

@jacobhinkle
Copy link
Collaborator Author

!test

@@ -41,10 +41,13 @@ class MatmulParams : public HeuristicParams {
// greater than one. Otherwise it is ignored.
int smem_circular_buffer_stage = 2;

int smem_circular_buffer_prefetch = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be -1, which means stage - 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stage - 1 seems like the fastest default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to have a parameter prefetch_gap and we set the prefetch value to stages - prefetch_gap. This would default to one. That way we would not need to be so careful when updating stages. What do you guys think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @rdspring1 and @zasdfgbnm I implemented this in the latest pushed change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference for prefetch or prefetch_gap but I bet I'll get them confused at some point. 😆

@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle changed the title Add MatmulParams::CircularBufferOptions::smem_circular_buffer_prefetch Add MatmulParams::CircularBufferOptions::smem_circular_buffer_prefetch_gap Dec 11, 2024
@jacobhinkle
Copy link
Collaborator Author

!build

@@ -41,10 +41,18 @@ class MatmulParams : public HeuristicParams {
// greater than one. Otherwise it is ignored.
int smem_circular_buffer_stage = 2;

// The circular buffering prefetch distance will be set to
// smem_circular_buffer_stage - smem_circular_buffer_prefetch_gap
// This value must be positive since the prefetch distance must be strictly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce this condition in setUpCircularBuffering?
e.g.,

NVF_THROW(smem_circular_buffer_prefetch_gap >= 0 && smem_circular_buffer_prefetch_gap < smem_circular_buffer_stage, "smem_circular_buffer_prefetch_gap is ", smem_circular_buffer_prefetch_gap,
" but is expected to be positive and less than number of stages - ", smem_circular_buffer_stage);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add that assert since the error message will be clearer than what we get from tv->circularBuffer when it's violated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the latter condition, is there any problem with having gap == num_stages?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel should still run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that prefix=0 does work. Added the assertion 0 < gap <= stages in latest push.

@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle merged commit 5716c09 into main Dec 12, 2024
17 checks passed
@jacobhinkle jacobhinkle deleted the prefetch_param branch December 12, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants