Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python download_datasets_fast.py --simulation_downloads_only --skip_confirmation very slow #3

Open
LukeLIN-web opened this issue May 21, 2023 · 2 comments

Comments

@LukeLIN-web
Copy link

LukeLIN-web commented May 21, 2023

python download_datasets_fast.py --simulation_downloads_only --skip_confirmation

Trying to download ogbn-papers100M
Downloading https://salient-datasets-ae.s3.amazonaws.com/ogbn-papers100M.zip
Downloaded 0.36 GB: 1%|▏ | 372/37979 [18:45<33:07:56, 3.17s/it

Is it reasonable?

Secondly, I tried only download ogbn-arxiv,ogbn-products, It still needs 1 hour


[Info] Will try to download: ogbn-arxiv,ogbn-products
[Info] Will *not* try to download: ogbn-papers100M,MAG240


Trying to download ogbn-arxiv
Skip ogbn-arxiv because it already exists.


Trying to download ogbn-products
Skip ogbn-products because it already exists.


#################################################################################################
######## 2. Downloading pre-generated partition labels for OGB datasets                  ########
#################################################################################################


Downloading https://salientplus-datasets-ae.s3.amazonaws.com/partition-data.zip
Downloaded 0.04 GB:   3%|█▎                                     | 40/1202 [01:01<1:06:48,  3.45s/it]
@timkaler
Copy link
Member

Would you mind sharing some details about your internet connection and your geographic location? It's possible we need to mirror the dataset to provide more robust access to folks downloading from europe, asia, etc. Downloading the partition labels should be very fast, but presently all data is hosted on the east coast of the united states --- its possible folks in europe/asia may have difficulty downloading these datasets quickly from these hosted locations.

@LukeLIN-web
Copy link
Author

Would you mind sharing some details about your internet connection and your geographic location? It's possible we need to mirror the dataset to provide more robust access to folks downloading from europe, asia, etc. Downloading the partition labels should be very fast, but presently all data is hosted on the east coast of the united states --- its possible folks in europe/asia may have difficulty downloading these datasets quickly from these hosted locations.

I am at Saudi Arabia :) https://fast.com/ can have 130 Mbps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants