-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi gpu training; RuntimeError: [...] LightningModule has parameters that were not used in producing the loss returned by training_step. #8
Comments
Having the same issue with multiple GPUs |
There is a ddp config in the config folder, which does the ddp strategy flag true to use multi gpu. About the consequences for unused parameters, I am not very well versed with that at the moment. |
in file /configs/trainer/ddp.yaml set strategy: ddp_find_unused_parameters_true
in file /configs/train.yaml set trainer: ddp
|
This solution also helped me when running on multiple GPUs |
added
devices="auto"
intrain.py
to utilize multiple gpusTraining terminates shortly after start with this error:
adding
strategy='ddp_find_unused_parameters_true'
to the trainer instantiate fixes it (all gpus used):
the
batch_idx
is not used when returning the loss bydef training_step(self, batch: Any, batch_idx: int):
inbaselightingmodule.py
but i'm not sure about the consequences/effect on training/quality. making issue for visibility
The text was updated successfully, but these errors were encountered: