Support nccl-test with mscclpp-nccl on H100 GPUs #404

seagater · 2024-12-09T19:04:58Z

Add implementation for ncclMemAlloc and ncclMemFree with mscclpp::allocSharedPhysicalCuda. Leverage a global map to store associations between raw pointer and shared pointer.
Add placeholder APIs ncclCommRegister() and ncclCommDeregister().
Build pass with docker image ghcr.io/microsoft/mscclpp/mscclpp:base-cuda12.4 on H100 node.

…nd commId; Remove the cudaDeleter

Binyang2014 · 2024-12-12T02:07:54Z

Close as move to another branch

seagater added 9 commits December 8, 2024 00:12

Initial support of msccl-nccl api for nccl-test on H100

c429209

Update communicatorMap, getCurrentNcclComm, ncclMemAlloc and ncclMemFree

ba7e4d2

Revise code for msccl-nccl api support for nccl-test on H100

9fcab7e

Fix some build issues

4298abe

Directly store the shared pointer to a map without the communicator a…

129d31b

…nd commId; Remove the cudaDeleter

Add placeholder API ncclCommRegister(); Remove some included headers

8943759

Update interface of ncclCommRegister

fad6050

Add placeholder API ncclCommDeregister

5c97835

Merge branch 'main' into qinghuazhou/nccl-test-support-mscclpp-nccl-H100

37e5951

Binyang2014 closed this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support nccl-test with mscclpp-nccl on H100 GPUs #404

Support nccl-test with mscclpp-nccl on H100 GPUs #404

seagater commented Dec 9, 2024 •

edited

Loading

Binyang2014 commented Dec 12, 2024

Support nccl-test with mscclpp-nccl on H100 GPUs #404

Support nccl-test with mscclpp-nccl on H100 GPUs #404

Conversation

seagater commented Dec 9, 2024 • edited Loading

Binyang2014 commented Dec 12, 2024

seagater commented Dec 9, 2024 •

edited

Loading