-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AllReduce distributed strategy design #373
base: develop
Are you sure you want to change the base?
Conversation
9dba66f
to
59f653b
Compare
Codecov Report
@@ Coverage Diff @@
## develop #373 +/- ##
===========================================
- Coverage 87.75% 87.69% -0.07%
===========================================
Files 33 33
Lines 1503 1503
===========================================
- Hits 1319 1318 -1
- Misses 121 122 +1
Partials 63 63
|
|
||
Single-process multi-GPU is not the recommended mode, | ||
becase of its overhead of scatter/gather and GIL contention in every forward pass. | ||
So, let's focus on DistributedDataParallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GoTorch does not have GIL, does single-process multi-GPU
mode fits GoTorch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer is no. There are two reasons:
-
The overhead of scatter/gather is also nonnegligible. We once use scatter/parallel do/gather to support multi-GPU AllReduce in Paddle with C++. From the experience, the speedup ratio is not very good.
-
We could only use scatter/gather in multi-GPU of one node. It could not be scaled to multi-node multi-GPU.
Here is for beter review