We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
num_workers > 1
The PR #237 broke streaming from S3.
from litdata.streaming import StreamingDataLoader, StreamingDataset, TokensLoader if __name__ == "__main__": train_dataset = StreamingDataset( # input_dir="s3://tinyllama-template/slimpajama/train", input_dir="/teamspace/s3_connections/tinyllama-template/slimpajama/train", item_loader=TokensLoader(block_size=128), ) train_dataloader = StreamingDataLoader(train_dataset, shuffle=True, batch_size=1, num_workers=4) for batch in train_dataloader: # hangs print(batch) break
The next() call on the iterator hangs. Only hangs with num_workers > 1.
Returns data. Before the commit #237 it works.
conda
pip
The text was updated successfully, but these errors were encountered:
awaelchli
Successfully merging a pull request may close this issue.
🐛 Bug
The PR #237 broke streaming from S3.
To Reproduce
The next() call on the iterator hangs.
Only hangs with
num_workers > 1
.Expected behavior
Returns data.
Before the commit #237 it works.
Environment
conda
,pip
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: