TL/CUDA: Linear Broadcast for GPU #948

ikryukov · 2024-03-22T14:30:30Z

What

Linear CUDA Broadcast implementation with Active set feature support.

Why ?

Functional improvement, parity with others communication libraries.
Ability to place many ranks on single GPU
No GPU blocking, communication initiated from host
Active set might be used to emulate P2P send/receive on top of broadcast collective

How ?

Naive approach where root rank writes data to own shared buffer and others ranks read from it through NVLink.
The algorithm enables the initiation of multiple simultaneous peer-to-peer (P2P) communications, constrained only by user-defined tags. During its execution, the algorithm identifies the first available free barrier and acquires it. This barrier, along with associated resources, is then utilized as a scratch buffer to facilitate efficient operations.

ikryukov · 2024-03-22T14:32:07Z

Configuration string:
./configure --with-ucx=$HPCX_UCX_DIR --with-cuda=/usr/local/cuda --with-mpi=$HPCX_MPI_DIR --enable-gtest --prefix=$PWD/install --with-nvcc-gencode="-gencode=arch=compute_80,code=sm_80" --enable-debug
Run string:
mpirun --mca coll ^hcoll --mca coll_ucc_enable 0 -x LD_LIBRARY_PATH=/home/ikryukov/work/ucc/install/lib:$LD_LIBRARY_PATH -x UCC_CLS=basic -x UCC_TLS=ucp,cuda -x xUCC_LOG_LEVEL=info -x UCC_TL_CUDA_LOG_LEVEL=debug -x UCC_LOG_LEVEL=info -x UCC_CONFIG_FILE= -np 2 ./install/bin/ucc_test_mpi -c bcast --teams world -M cuda -O 0 -S 2

swx-jenkins3 · 2024-03-22T14:33:43Z

Can one of the admins verify this patch?

samnordmann

Looks good to me! Thanks!
I only left some minor remarks. Can you, in addition, add this algo to the tests?

src/components/tl/cuda/tl_cuda.h

src/components/tl/cuda/bcast/bcast_linear.c

ikryukov · 2024-08-26T11:27:41Z

Looks good to me! Thanks! I only left some minor remarks. Can you, in addition, add this algo to the tests?

Thanks for review, addressed comments and added test to validate bcast for cuda too.

src/components/tl/cuda/bcast/bcast_linear.c

src/components/tl/cuda/tl_cuda.h

janjust

looks good

manjugv · 2024-09-18T17:12:28Z

ping @Sergei-Lebedev

samnordmann · 2024-12-11T16:25:30Z

ok to test

ikryukov marked this pull request as draft March 22, 2024 14:32

ikryukov force-pushed the cuda_bcast branch from 7c22cb5 to c9e7048 Compare April 12, 2024 13:42

ikryukov force-pushed the cuda_bcast branch 3 times, most recently from 00f3922 to 6caea67 Compare July 4, 2024 15:38

ikryukov force-pushed the cuda_bcast branch from e6f4223 to f6d7536 Compare August 2, 2024 15:54

ikryukov marked this pull request as ready for review August 2, 2024 16:17

Sergei-Lebedev requested review from janjust, Sergei-Lebedev and samnordmann August 14, 2024 09:14

Sergei-Lebedev added the Ready-for-Review label Aug 14, 2024

samnordmann reviewed Aug 19, 2024

View reviewed changes

ikryukov force-pushed the cuda_bcast branch from 11c7311 to 99264d8 Compare August 23, 2024 13:17

Sergei-Lebedev reviewed Aug 30, 2024

View reviewed changes

src/components/tl/cuda/bcast/bcast_linear.c Outdated Show resolved Hide resolved

src/components/tl/cuda/bcast/bcast_linear.c Show resolved Hide resolved

src/components/tl/cuda/tl_cuda.h Outdated Show resolved Hide resolved

samnordmann self-requested a review September 9, 2024 08:16

samnordmann approved these changes Sep 9, 2024

View reviewed changes

janjust approved these changes Sep 9, 2024

View reviewed changes

ikryukov force-pushed the cuda_bcast branch from 8b8720c to 87c4424 Compare October 28, 2024 10:36

manjugv added WIP - Don't Merge and removed Ready-for-Review labels Nov 13, 2024

ikryukov force-pushed the cuda_bcast branch from 54e6440 to 0d9ab19 Compare November 14, 2024 12:18

janjust added Ready-for-Review Code-Review-Required and removed WIP - Don't Merge labels Dec 2, 2024

ikryukov force-pushed the cuda_bcast branch from 4c25b18 to 17a4545 Compare December 12, 2024 14:12

ikryukov added 29 commits December 30, 2024 12:48

TL/CUDA: fix build

9eadde5

TL/CUDA: fixed comments

5c5ae84

TL/CUDA: select free bar using atomic

d3a60be

TL/CUDA: fix

a23b68a

TL/CUDA: replace free tag

ac83e63

TL/CUDA: fix bar tag init val

c083fa2

TL/CUDA: added tag print

67edb77

TL/CUDA: changed tag to 64bits

3fc1c3e

TL/CUDA: fixed linter errors

40c6ea8

TL/CUDA: bar init logic in progress

644012c

TL/CUDA: fix CI build

62f6ba7

TL/CUDA: removed unused var

43d1900

TL/CUDA: refactor bar init

b914f53

TL/CUDA: fix ci

53ca155

TL/CUDA: added bar stage

1658638

TL/CUDA: revert bar stage

da95927

TL/CUDA: free bar in progress

1dd83ba

TL/CUDA: completion barrier

b29bcf7

TL/CUDA: removed prints

e16d9d4

TL/CUDA: fix format

7b0d87f

TL/CUDA: remove unused atomic

da0c021

TL/CUDA: removed unused include

b0da90f

TL/CUDA: fix clang compilation

5c1e28b

TL/CUDA: fixed comments,format,removed bool

da19729

TL/CUDA: hide functions

ee09b83

TL/CUDA: fix bug in non active set version

904c671

TL/CUDA: fixed build

3486f89

TL/CUDA: added assertions

9f2e9d1

TL/CUDA: addressed comments

ccc60a5

ikryukov force-pushed the cuda_bcast branch from 6a63062 to ccc60a5 Compare December 30, 2024 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TL/CUDA: Linear Broadcast for GPU #948

TL/CUDA: Linear Broadcast for GPU #948

ikryukov commented Mar 22, 2024 •

edited

Loading

ikryukov commented Mar 22, 2024 •

edited

Loading

swx-jenkins3 commented Mar 22, 2024

samnordmann left a comment

ikryukov commented Aug 26, 2024

janjust left a comment

manjugv commented Sep 18, 2024

samnordmann commented Dec 11, 2024

TL/CUDA: Linear Broadcast for GPU #948

Are you sure you want to change the base?

TL/CUDA: Linear Broadcast for GPU #948

Conversation

ikryukov commented Mar 22, 2024 • edited Loading

What

Why ?

How ?

ikryukov commented Mar 22, 2024 • edited Loading

swx-jenkins3 commented Mar 22, 2024

samnordmann left a comment

Choose a reason for hiding this comment

ikryukov commented Aug 26, 2024

janjust left a comment

Choose a reason for hiding this comment

manjugv commented Sep 18, 2024

samnordmann commented Dec 11, 2024

ikryukov commented Mar 22, 2024 •

edited

Loading

ikryukov commented Mar 22, 2024 •

edited

Loading