-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hello, i found a problem about the execution time #4
Comments
Hi, May I know if you have reset the alpha and beta values (https://github.com/HKBU-HPML/MG-WFBP/blob/master/distributed_optimizer.py#L166) that should be evaluated from your own cluster? MG-WFBP requires this information (can use nccl-tests to estimate) to generate a merging solution that fits the cluster. |
thanks for your reply. I will try your suggestion |
I use the benchmark function ( MG-WFBP/distributed_optimizer.py Line 105 in 5b8ad54
Is it for this reason? |
Hi, the startup time (i.e., alpha) is quite large, so MG-WFBP tends to merge all layers into a single one. You may need to check your hardware configuration. There may be different reasons (e.g., p2p support or not, # of CPU PCIe lanes, PCIe version, etc.) causing a large latency. |
OK,I will check my hardware configuration. Thanks for your suggestion |
I used the single-layer algorithm(By use __merge function to merge every layer) to run this program,then I found that the non-overlapped time is bigger than MG-WFBP algorithm.This is very right. But the time used in every iteration and epoch is less than MG-WFBP.
The text was updated successfully, but these errors were encountered: