Sequence Parallel Forward Transformer #3338

cowanmeg · 2024-11-04T19:26:42Z

Sequence parallel forward transformer layer and multi-headed attention tests.

Cleans up sharding annotations in Forward fusion definitions. Only sharding changes and inputs are explicitly sharded.
Updates output of mha and mlp to be a struct with named TVs to make code more readable.
Dropout probability is temporarily set to 0. This will be fixed in a later PR to use philox seed and offset with validation.

cowanmeg · 2024-11-15T17:54:42Z

!build

wujingyue

Thanks! I'm still reviewing the MHA part...

tests/cpp/test_multidevice_transformer.cpp

wujingyue · 2024-11-18T16:57:16Z

tests/cpp/test_multidevice_transformer.cpp

@@ -1074,11 +1305,11 @@ TEST_P(DistributedTransformerTest, Forward) {
  auto ln_input = castOp(DataType::Float, x);
  auto ln0 = layer_norm(ln_input, norm_shape, ln0_w, ln0_b, eps);
  auto mha_in = castOp(dtype, ln0.output);
-  auto mha_out = mha(mha_in, mha_w0, mha_b0, mha_w1, mha_b1, mesh)[3];
+  auto mha_out = mha(mha_in, mha_w0, mha_b0, mha_w1, mha_b1, mesh).output;


cowanmeg · 2024-11-19T17:13:26Z

!build

liqiangxl · 2024-11-19T23:08:04Z

check DistributedTransformerTest.MultiheadAttention_SP/__half
!test

liqiangxl · 2024-11-19T23:08:28Z

!test

Sequence parallel forward transformer layer and multi-headed attention tests. 1. Cleans up sharding annotations in Forward fusion definitions. Only sharding changes and inputs are explicitly sharded. 2. Updates output of mha and mlp to be a struct with named TVs to make code more readable. 3. Dropout probability is temporarily set to 0. This will be fixed in a later PR to use philox seed and offset with validation.

cowanmeg added 5 commits October 31, 2024 13:48

sequence parallel forward and new sharding functions

e8c4a4e

clean up and fixes

6ea75f5

clean up

6aeee82

remove uneeded changes

ae9d1f3

lint

2a2a846

cowanmeg requested review from wujingyue and samnordmann November 15, 2024 17:54

samnordmann approved these changes Nov 18, 2024

View reviewed changes

wujingyue reviewed Nov 18, 2024

View reviewed changes

tests/cpp/test_multidevice_transformer.cpp Outdated Show resolved Hide resolved

tests/cpp/test_multidevice_transformer.cpp Show resolved Hide resolved

tests/cpp/test_multidevice_transformer.cpp Show resolved Hide resolved

tests/cpp/test_multidevice_transformer.cpp Show resolved Hide resolved

wujingyue reviewed Nov 18, 2024

View reviewed changes

Rename

9be93b6

wujingyue approved these changes Nov 19, 2024

View reviewed changes

cowanmeg merged commit 6f0909e into NVIDIA:main Nov 19, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Parallel Forward Transformer #3338

Sequence Parallel Forward Transformer #3338

cowanmeg commented Nov 4, 2024

cowanmeg commented Nov 15, 2024

wujingyue left a comment

wujingyue Nov 18, 2024

cowanmeg commented Nov 19, 2024

liqiangxl commented Nov 19, 2024

liqiangxl commented Nov 19, 2024

Sequence Parallel Forward Transformer #3338

Sequence Parallel Forward Transformer #3338

Conversation

cowanmeg commented Nov 4, 2024

cowanmeg commented Nov 15, 2024

wujingyue left a comment

Choose a reason for hiding this comment

wujingyue Nov 18, 2024

Choose a reason for hiding this comment

cowanmeg commented Nov 19, 2024

liqiangxl commented Nov 19, 2024

liqiangxl commented Nov 19, 2024