-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is litdata faster when loading local dataset or network storage s3 dataset? #428
Comments
Hi! thanks for your contribution!, great first issue! |
Another Question,can I use |
Hey @2catycm, Yes, some users reported increased speed even running locally. We don t support sssfs but it shouldn t be hard to add it if you want too. Feel free to make a PR. Best, |
Thanks for your reply, I am trying to use vtab-1k dataset locally and tried to use litdata to optimize it. And I found that on a subset of length 800, the speed of litdata is faster than pytorch Dataset by 1.41 times (147 ms -> 104ms). I am not sure whether my benchmark is appropriate, since I just trivially iterate the dataset, haven't used it to train. %%timeit
bar = tqdm(train_dataset)
for i, data in enumerate(bar):
pass |
Hey @2catycm Yes, this is appropriate. We benchmark by iterating over the dataset 2 epochs in the cloud, one epoch locally. |
Hey @2catycm. We could probably make it slightly faster too. |
When my storage is large enough to download the dataset locally, should I still use litdata's streaming api?
The text was updated successfully, but these errors were encountered: