You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the acc of MG-WFBP will same as the acc of original sgd?
i think MG-WFBP use the small batch-size. Because the it has batch update weights, it means reduce original sgd.
2)can the way in MG-WFBP use in adam ?
the adam may takes longer to backward, Reduce the conflict problem in many papers.
The text was updated successfully, but these errors were encountered:
The accuracy of MG-WFBP is the same as the original synchronized SGD (S-SGD). For any given mini-batch size, MG-WFBP averages the gradients which are consistent with the average operation of S-SGD. The reason why MG-WFBP can run faster than S-SGD is that it merges gradients at the "right" position so that more communications are hidden.
MG-WFBP can also be applied in many first-order optimizers including Adam as the key idea of MG-WFBP is to schedule the gradients for communication. So one can use the averaged gradients from P workers to customize its own optimizers.
Hope the responses address your concerns. Thanks.
i think MG-WFBP use the small batch-size. Because the it has batch update weights, it means reduce original sgd.
2)can the way in MG-WFBP use in adam ?
the adam may takes longer to backward, Reduce the conflict problem in many papers.
The text was updated successfully, but these errors were encountered: