You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running on HPG, the print output that gives us "Training batch is on device _" is only reading "device 0". Is this missing computations on the other GPU (i.e. "device 1"), which the program states is indeed there at the beginning, or is this second GPU not getting used during training for some reason?
The text was updated successfully, but these errors were encountered:
Oh what lol.
In Wandb, which is where I was checking, there are actually two runs going on. The second one has less information in the logs but just says which device the operations are on and both training and validation are on "device 1", which is the second GPU. Cool! (I think).
When running on HPG, the print output that gives us "Training batch is on device _" is only reading "device 0". Is this missing computations on the other GPU (i.e. "device 1"), which the program states is indeed there at the beginning, or is this second GPU not getting used during training for some reason?
The text was updated successfully, but these errors were encountered: