-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MatmulParams::CircularBufferOptions::smem_circular_buffer_prefetch_gap #3558
Conversation
!test |
csrc/scheduler/matmul_heuristic.h
Outdated
@@ -41,10 +41,13 @@ class MatmulParams : public HeuristicParams { | |||
// greater than one. Otherwise it is ignored. | |||
int smem_circular_buffer_stage = 2; | |||
|
|||
int smem_circular_buffer_prefetch = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default be -1
, which means stage - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stage - 1
seems like the fastest default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to have a parameter prefetch_gap
and we set the prefetch value to stages - prefetch_gap
. This would default to one. That way we would not need to be so careful when updating stages. What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw @rdspring1 and @zasdfgbnm I implemented this in the latest pushed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference for prefetch
or prefetch_gap
but I bet I'll get them confused at some point. 😆
!build |
!build |
@@ -41,10 +41,18 @@ class MatmulParams : public HeuristicParams { | |||
// greater than one. Otherwise it is ignored. | |||
int smem_circular_buffer_stage = 2; | |||
|
|||
// The circular buffering prefetch distance will be set to | |||
// smem_circular_buffer_stage - smem_circular_buffer_prefetch_gap | |||
// This value must be positive since the prefetch distance must be strictly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enforce this condition in setUpCircularBuffering
?
e.g.,
NVF_THROW(smem_circular_buffer_prefetch_gap >= 0 && smem_circular_buffer_prefetch_gap < smem_circular_buffer_stage, "smem_circular_buffer_prefetch_gap is ", smem_circular_buffer_prefetch_gap,
" but is expected to be positive and less than number of stages - ", smem_circular_buffer_stage);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can add that assert since the error message will be clearer than what we get from tv->circularBuffer
when it's violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the latter condition, is there any problem with having gap == num_stages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel should still run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked that prefix=0
does work. Added the assertion 0 < gap <= stages
in latest push.
!build |
Previously we had hardcoded to use
smem_circular_buffer_stage - 1
but this gives us more flexibility in our heuristic. We will usesmem_circular_buffer_stage - smem_circular_buffer_prefetch_gap
as the prefetch distance. Parametrizing this way lets us modify the number of stages without needing to adjust the prefetch gap each time, since it will most commonly just be 1.