-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ring-based decomposition for Allgather+GEMM overlap ATen implementation #3392
Conversation
f8ecd94
to
abaf220
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you nsys and did that show overlapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
d429e5c
to
f469ee4
Compare
@wujingyue I ran an experiment that showed some overlap, but it seems like I need to spend more time (and with more detailed profiling instrumentation) to understand it fully. |
When ready, please share a screenshot of the night profile here for the record. Even if the observed overlap is not perfect, just seeing some overlap might be enough to validate the algo and merge the pr ; then, it will be non-trivial to fully understand how to get the best performance |
This is a better screenshot: The nameless ...'s are nccl sendrecv, then gemm. You can see the sendrecv is fully overlapped with the gemm in a few cases. The profile still looks different than the profile in the Nemo Megatron Parallelization Techniques document, though, which is not what I was expecting |
23459eb
to
08073ef
Compare
It's hard to tell from the figure which blocks are gemm and which are allgather -- they all look blue boxes starting with "void". But I believe what you said! |
Thanks! Sorry--the ones that say "void" are GEMM kernels, the smaller boxes are all ncclSendrecv |
Co-authored-by: samnordmann <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>
c482ec9
to
06aae9b
Compare
!build |
Implementation using ATen of https://docs.google.com/document/d/1Fzr9Zs2Dqfj3e4yR8LKxFrRqC1EkMUfQczJMYQGJQUI/edit?tab=t.0#heading=h.5x7hptdjzhet