Adding the new feature of FPDT #6462

YJHMITWEB · 2024-08-29T23:48:00Z

FPDT can only be used with this version of Megatron-DeepSpeed.

…rger than 1

delock · 2024-08-30T01:17:12Z

deepspeed/sequence/fpdt_layer.py

+    return out, lse
+
+
+def single_all_to_all(input_, scatter_idx, gather_idx, group):


Hi @YJHMITWEB is this single_all_to_all the same as the one in

DeepSpeed/deepspeed/sequence/layer.py

Line 41 in 89c4d9f

def single_all_to_all(input, scatter_idx, gather_idx, batch_dim_idx, group, async_op=False, handle=None, type=None):

?

We have not tested the non-blocking All2all with our FPDT design, therefore, we use the original non-blocking version. If this is preferred, we can test this.

This is fixed. We will now use the single_all_to_all from layer.py

delock · 2024-08-30T01:20:22Z

deepspeed/sequence/fpdt_layer.py

+
+                compute_stream.wait_stream(offload_stream)
+                compute_stream.synchronize()
+                with torch.cuda.stream(offload_stream):


torch.cuda.stream(stream) should be replaced by get_accelerator().stream(stream), same of other occurance in this file.

delock · 2024-08-30T01:21:28Z

deepspeed/sequence/fpdt_layer.py

+            grad_qkv_linear_bias = torch.zeros(qkv_linear_bias.shape, device=qkv_linear_weight.device, dtype=torch.float)
+
+        grad_global_attn_output_chunk = single_all_to_all(grad_output[:, :chunk_size].contiguous(), scatter_idx, gather_idx, spg)
+        torch.cuda.synchronize()


Here is should be get_accelerator().synchronize()

Got it, thanks.

This one is solved.

inkcherry · 2024-08-30T05:03:39Z

deepspeed/sequence/layer.py

@@ -168,7 +196,8 @@ def __init__(
        self.gather_idx = gather_idx
        self.sp_overlap_comm = False
        self.overlap_handles = None
-        self.sp_stream = sp_stream
+        self.sp_stream = None
+        # self.sp_stream = sp_stream


Why comment out this line?

This one is solved.

deepspeed/sequence/layer.py

deepspeed/sequence/fpdt_layer.py

fix format and add unit test for fpdt

[FPDT](https://arxiv.org/abs/2408.16978) can only be used with [this version](microsoft/Megatron-DeepSpeed#441) of Megatron-DeepSpeed. --------- Co-authored-by: Jinghan Yao <[email protected]> Co-authored-by: Sam Ade Jacobs <[email protected]> Co-authored-by: Jinghan Yao <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Jinghan Yao <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]>

Jinghan Yao and others added 6 commits August 3, 2024 00:04

fix the bug of deepspeed sequence parallel working with batch size la…

c076827

…rger than 1

Merge branch 'master' into master

1b8a8c1

apply yapf formatting

ed34e89

Formatting fixes

89b119e

Merge branch 'microsoft:master' into master

7db5798

add FPDT

0beff24

YJHMITWEB requested a review from tjruwase as a code owner August 29, 2024 23:48

Merge branch 'master' into master

4522ed7

YJHMITWEB mentioned this pull request Aug 30, 2024

Adding the new feature of FPDT microsoft/Megatron-DeepSpeed#441

Open

delock reviewed Aug 30, 2024

View reviewed changes

inkcherry reviewed Aug 30, 2024

View reviewed changes

tohtana reviewed Aug 30, 2024

View reviewed changes

deepspeed/sequence/layer.py Outdated Show resolved Hide resolved

tohtana reviewed Aug 30, 2024

View reviewed changes

deepspeed/sequence/fpdt_layer.py Outdated Show resolved Hide resolved

tjruwase and others added 9 commits September 6, 2024 17:59

Merge branch 'master' into master

c15d1d8

modify streams

69f3892

modify streams

8ef9f5a

Merge branch 'master' into master

b43c5ec

remove duplication of alltoall

a55d1f5

Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed

1cbd59d

remove duplication of pos

6bfd76f

fix format

4eeadca

Merge branch 'master' into master

8994991

tohtana requested a review from loadams as a code owner October 10, 2024 15:49

Jinghan Yao and others added 5 commits October 10, 2024 19:00

fix format and add unit test for fpdt

128286c

Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed

386f606

fix format and add unit test for fpdt

add einops

ebea5b0

add flashattn

5c8eec8

Merge branch 'master' into master

a7e175a

tohtana and others added 18 commits November 6, 2024 17:42

add cron for test and reporting for nightly CI failures

9e811b8

add multiGPU fpdt unit test

a7522da

add multiGPU fpdt unit test

209adab

add multiGPU fpdt unit test

dbeea8a

add multiGPU fpdt unit test

845e42d

add multiGPU fpdt unit test

8b2549c

add multiGPU fpdt unit test

058c973

add multiGPU fpdt unit test

0dcc234

add multiGPU fpdt unit test

d1be5d3

add multiGPU fpdt unit test

3a0feba

add multiGPU fpdt unit test

8c57812

add multiGPU fpdt unit test

43decf6

add multiGPU fpdt unit test

d39585c

add multiGPU fpdt unit test

389b1a3

add multiGPU fpdt unit test

958f3bf

add multiGPU fpdt unit test

af025c5

Merge branch 'master' into master

2230377

Merge branch 'master' into master

7636690

tohtana approved these changes Nov 26, 2024

View reviewed changes

tohtana added this pull request to the merge queue Nov 26, 2024

loadams mentioned this pull request Nov 26, 2024

Flops profiler support einops.einsum #6755

Open

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024

loadams added this pull request to the merge queue Nov 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Nov 26, 2024

tohtana added this pull request to the merge queue Nov 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024

loadams added this pull request to the merge queue Nov 26, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the new feature of FPDT #6462

Adding the new feature of FPDT #6462

YJHMITWEB commented Aug 29, 2024 •

edited by samadejacobs

Loading

delock Aug 30, 2024

YJHMITWEB Aug 30, 2024

YJHMITWEB Oct 8, 2024

delock Aug 30, 2024 •

edited

Loading

YJHMITWEB Aug 30, 2024

delock Aug 30, 2024 •

edited

Loading

YJHMITWEB Aug 30, 2024

YJHMITWEB Sep 24, 2024

inkcherry Aug 30, 2024

YJHMITWEB Sep 24, 2024

		return out, lse


		def single_all_to_all(input_, scatter_idx, gather_idx, group):

Adding the new feature of FPDT #6462

Are you sure you want to change the base?

Adding the new feature of FPDT #6462

Conversation

YJHMITWEB commented Aug 29, 2024 • edited by samadejacobs Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delock Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delock Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YJHMITWEB commented Aug 29, 2024 •

edited by samadejacobs

Loading

delock Aug 30, 2024 •

edited

Loading

delock Aug 30, 2024 •

edited

Loading