Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ring-based decomposition for Allgather+GEMM overlap ATen implementation #3392

Merged
merged 19 commits into from
Dec 13, 2024

Conversation

nsarka
Copy link
Member

@nsarka nsarka commented Nov 11, 2024

@nsarka nsarka self-assigned this Nov 11, 2024
tests/cpp/test_multidevice_overlap.cpp Outdated Show resolved Hide resolved
tests/cpp/test_multidevice_overlap.cpp Outdated Show resolved Hide resolved
@nsarka nsarka force-pushed the nsarka/ring-ag-overlap branch from f8ecd94 to abaf220 Compare November 12, 2024 18:10
Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you nsys and did that show overlapping?

tests/cpp/test_multidevice_overlap.cpp Show resolved Hide resolved
Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@nsarka nsarka force-pushed the nsarka/ring-ag-overlap branch 2 times, most recently from d429e5c to f469ee4 Compare December 5, 2024 20:16
@nsarka
Copy link
Member Author

nsarka commented Dec 6, 2024

Did you nsys and did that show overlapping?

@wujingyue I ran an experiment that showed some overlap, but it seems like I need to spend more time (and with more detailed profiling instrumentation) to understand it fully.

@samnordmann
Copy link
Collaborator

Did you nsys and did that show overlapping?

@wujingyue I ran an experiment that showed some overlap, but it seems like I need to spend more time (and with more detailed profiling instrumentation) to understand it fully.

When ready, please share a screenshot of the night profile here for the record. Even if the observed overlap is not perfect, just seeing some overlap might be enough to validate the algo and merge the pr ; then, it will be non-trivial to fully understand how to get the best performance

@nsarka
Copy link
Member Author

nsarka commented Dec 6, 2024

image

Here is a screenshot. This is with M=K=N=2^12 on sjc-dc2-dgx1-2 (4x V100)

tests/cpp/test_multidevice_overlap.cpp Show resolved Hide resolved
tests/cpp/test_multidevice_overlap.cpp Outdated Show resolved Hide resolved
@nsarka
Copy link
Member Author

nsarka commented Dec 6, 2024

This is a better screenshot:

image

The nameless ...'s are nccl sendrecv, then gemm. You can see the sendrecv is fully overlapped with the gemm in a few cases. The profile still looks different than the profile in the Nemo Megatron Parallelization Techniques document, though, which is not what I was expecting

@nsarka nsarka force-pushed the nsarka/ring-ag-overlap branch 2 times, most recently from 23459eb to 08073ef Compare December 9, 2024 15:11
@nsarka
Copy link
Member Author

nsarka commented Dec 10, 2024

After swapping the order that comm and compute are posted, the overlap looks almost perfect:

image

@wujingyue
Copy link
Collaborator

After swapping the order that comm and compute are posted, the overlap looks almost perfect:

It's hard to tell from the figure which blocks are gemm and which are allgather -- they all look blue boxes starting with "void". But I believe what you said!

@nsarka
Copy link
Member Author

nsarka commented Dec 11, 2024

After swapping the order that comm and compute are posted, the overlap looks almost perfect:

It's hard to tell from the figure which blocks are gemm and which are allgather -- they all look blue boxes starting with "void". But I believe what you said!

Thanks! Sorry--the ones that say "void" are GEMM kernels, the smaller boxes are all ncclSendrecv

@nsarka nsarka force-pushed the nsarka/ring-ag-overlap branch from c482ec9 to 06aae9b Compare December 12, 2024 21:53
@nsarka
Copy link
Member Author

nsarka commented Dec 12, 2024

!build

@nsarka nsarka merged commit 568f04f into NVIDIA:main Dec 13, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants