-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train.py doesn't use GPU #14
Comments
The script checks if I don't know much about how Colab with external scripts works. But something you can try, to validate this, is to create a small script like import torch
print(f'Torch available: {torch.cuda.is_available()}') and call it with the |
You were right! I didn't execute it as external script, as Google Colabo works with jupyter notebooks in interactive mode only.
I got
It still doesn't solve my problem, but you found the root cause. Will keep you updated if I find a solution. |
Update: Verify the correct CUDA version which pytorch can see by: @tristandeleu Do you have any idea why this is still so slow? |
I don't have any formal benchmark for how fast this should be unfortunately. But looking at some logs, I was getting one epoch done in 92s on a Titan Xp (this includes 100 batches of training, and 250 batches of validation per epoch -- among these 92s, validation was taking about 35s) for MiniImagenet 5-way 5-shot with the settings from the paper ( Hope this helps! |
This definitely helps as reference.
Will do more tests and report back.
…On 31. Dec 2020, 20:43 +0900, Tristan Deleu ***@***.***>, wrote:
I don't have any formal benchmark for how fast this should be unfortunately. But looking at some logs, I was getting one epoch done in 92s on a Titan Xp (this includes 100 batches of training, and 250 batches of validation per epoch -- among these 92s, validation was taking about 35s) for MiniImagenet 5-way 5-shot with the settings from the paper (--num-steps 5 --step-size 0.01 --batch-size 4 --num-batches 100 --hidden-size 32) and 8 workers for data loading.
Hope this helps!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
It seems that the problem is solved although I cannot reproduce the reason. The benchmark for your run: reproduce Deleu paper
ran smoothly in 80s per epoch using an NVIDIA V100 with
Only one detail was missing to reproduce your experiment - |
Good news! |
That is good news! I/O can be a really big factor. The 250 batches of validation per epoch is something I have in an internal version of the code: it comes from the fact that I am using a fixed subset of 1000 tasks from the meta-validation split for the evaluation at each epoch. The code in this repo uses I won't be able to push the corresponding code, because it has a number of internal dependencies unfortunately. However I can give you some steps to reproduce it yourself if you want:
import random
from torchmeta.datasets.helpers import miniimagenet
def create_indices(dataset, num_tasks):
indices = set()
for _ in range(num_tasks):
indices.add(tuple(random.sample(range(len(dataset.dataset)),
dataset.num_classes_per_task)))
indices = list(list(x) for x in indices)
return indices
dataset = miniimagenet('data', shots=5, ways=5, meta_val=True)
indices = create_indices(dataset, 1000) # Sample 1000 tasks
import json
with open('path/to/val_indices/miniimagenet_5way_5shot.json', 'w') as f:
meta_val_indices = json.dump(indices, f)
# Load the indices (e.g. using json)
# with open('path/to/val_indices/miniimagenet_5way_5shot.json', 'r') as f:
# meta_val_indices = json.load(f)
meta_val_dataset = Subset(meta_val_dataset, meta_val_indices) This is missing a lot of the logic (e.g. how to fetch the correct json file for a specific |
This is super helpful, thanks a lot.
Coming from psychology, it will be challenging to implement this function flawlessly but I will try.
Thanks again!
…On 4. Jan 2021, 00:09 +0900, Tristan Deleu ***@***.***>, wrote:
That is good news! I/O can be a really big factor.
The 250 batches of validation per epoch is something I have in an internal version of the code: it comes from the fact that I am using a fixed subset of 1000 tasks from the meta-validation split for the evaluation at each epoch. The code in this repo uses num_batches * batch_size random tasks from the meta-validation split at each epoch (they are different at every epoch). Using a fixed subset of tasks for evaluation is a better option, much closer to the way we do validation in standard supervised learning.
I won't be able to push the corresponding code, because it has a number of internal dependencies unfortunately. However I can give you some steps to reproduce it yourself if you want:
• Create the subset of indices for a specific dataset; I'm taking MiniImagenet 5-way 5-shots as an example. I have a utility function to do that (which is not perfect, because it could return fewer tasks than requested, but that did the trick for me):
import random
from torchmeta.datasets.helpers import miniimagenet
def create_indices(dataset, num_tasks):
indices = set()
for _ in range(num_tasks):
indices.add(tuple(random.sample(range(len(dataset.dataset)),
dataset.num_classes_per_task)))
indices = list(list(x) for x in indices)
return indices
dataset = miniimagenet('data', shots=5, ways=5, meta_val=True)
indices = create_indices(dataset, 1000) # Sample 1000 tasks
• Save the indices in a separate file to freeze this subset of indices and reload it later:
import json
with open('path/to/val_indices/miniimagenet_5way_5shot.json', 'w') as f:
meta_val_indices = json.dump(indices, f)
• In get_benchmark_by_name, use Subset class from PyTorch to only take a subset from meta_val_dataset
# Load the indices (e.g. using json)
# with open('path/to/val_indices/miniimagenet_5way_5shot.json', 'r') as f:
# meta_val_indices = json.load(f)
meta_val_dataset = Subset(meta_val_dataset, meta_val_indices)
This is missing a lot of the logic (e.g. how to fetch the correct json file for a specific dataset, shots and ways arguments of get_benchmark_by_name), but I hope this helps!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I am running train.py successfully on my local machine (Macbook Pro 16).
Yet in Google Colabo, it seems to take an endless time so start the first epoch (the empty progressbar is shown).
I verified that CUDA is available:
The script starts as:
and gives:
So this makes 242s or 4min per iteration, whereas the same configuration and identical code on my Macbook Pro without GPU takes only about 2.4s per iteration - factor 100:
Why is this is the case?
A hint is that the GPU is not used, as Colabo shows a popup window after some minutes saying:
Warning: you are connected to a GPU runtime, but not utilizing the GPU. Change to a standard runtime
In this case, is 16 CPU cores on Macbook vs. 2 in Google Colabo. This doesn't account for the factor 100 between them but might be a hint.
I am convinced that this code must be super fast when running on an NVIDIA P100.
So I would be very grateful for any hints!
The text was updated successfully, but these errors were encountered: