You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work! Wondering if it's possible to change the cache directory to a user specified one, e.g. having an additional parameter for StreamingDataset.
Motivation
Right now this cache path is hardcoded and cannot be changed unless we hacked into the constants.py file. However, in our case, we have an extremely slow disk that happens to be mounted to the home directory, where is currently used by litdata for caching. This becomes the biggest bottleneck even though our connection to S3 is fast enough.
The text was updated successfully, but these errors were encountered:
Actually found a ad-hoc method to make this work, which is not documented I believe. For those curious, here's a minimalist example:
from litdata.streaming.resolver import Dir
from litdata import StreamingDataset
input_dir = "<path-to-input-dir>"
if input_dir.startswith("s3://"):
input_dir = Dir(url=input_dir, path="<path-to-cache-dir>")
dataset = StreamingDataset(input_dir)
🚀 Feature
Thanks for the great work! Wondering if it's possible to change the cache directory to a user specified one, e.g. having an additional parameter for StreamingDataset.
Motivation
Right now this cache path is hardcoded and cannot be changed unless we hacked into the constants.py file. However, in our case, we have an extremely slow disk that happens to be mounted to the home directory, where is currently used by litdata for caching. This becomes the biggest bottleneck even though our connection to S3 is fast enough.
The text was updated successfully, but these errors were encountered: