-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal segmentation for RoPE #2599
Labels
Comments
wujingyue
added a commit
that referenced
this issue
Aug 19, 2024
wujingyue
added a commit
that referenced
this issue
Aug 20, 2024
The attempted fix had a mixed effect on performance: #2815 (comment). So this is still open. I'm reassigning this to @jjsjann123, the POC for RoPE. |
We can close this one for now. Since with the preseg passes we are no longer segment across Though suboptimal segmentation still remains an issue for rope, where @naoyam 's scheduler change should be able to resolve that. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
TL;DR
the RoPE module in Llama-2 (and possibly other configs as well) is over-segmented into six pointwise kernels. Two should suffice. This issue blocks Lightning-AI/lightning-thunder#731, although we don't necessarily have to reach two to unblock.
To reproduce
Problem
The following figure shows the current suboptimal segmentation
The highlighted ops form the three pointwise kernels for Q. There's a similar pattern for K that's omitted for simplicity, so there are in total six pointwise kernels. The remaining, meta ops are expectedly segmented out as "no-op" regions.
Note: For now, I omitted the
Pad
s before aCat
for simplicity. Soonish, #2373, pending, may movePad
s upstream, affecting segmentation.There are two main reasons we reached this state.
Slice
andPad
(and thus beforeCat
).Squeeze
and the redToFloat
were merged too early. This made it impossible to keep merging the red and the green segment, becauseSqueeze
is a transitive input to theCat
and the merge would lead to a cycle. Several sub-reasons contributed to that order:+
its operands, the segmenter only merged the right operand instead of both.The text was updated successfully, but these errors were encountered: