-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed computing (eg, multi-GPU) support #13
Comments
@erensezener Did you figure this out? :) |
@gizacard Would you mind providing some instruction on this? which options should be set? Thanks |
@gizacard I wanted to train using Multi-GPU (4 gpus) , and for that I used the
Although I am not aiming for slurs job the code here require me to set After setting this parameters, when I run the code, the training never starts. Though without distributed_training (single gpu) it works fine. Can you guide me if I am doing correct? Thanks |
Something like this worked for me |
@fabrahman Could you provide an update on this issue? I have exactly the same issue, and I found that the code freezes without any error message after executing line 194 of train_reader.py
|
@Duemoo I also encountered this problem, using multiple gpu, I found that the code freezes without any error message after executing line 194 of train_reader.py, |
@szh-max Hi I also encountered this problem and I solved it by updating torch version to torch==1.10.0 and |
I see that there is some code supporting multi-GPUs, eg here and here.
However, I don't see an option/flag to actually utilize distributed computing. Could you clarify?
Thank you.
The text was updated successfully, but these errors were encountered: