Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming from s3 hangs when num_workers > 1 #306

Closed
awaelchli opened this issue Aug 6, 2024 · 0 comments · Fixed by #312
Closed

Streaming from s3 hangs when num_workers > 1 #306

awaelchli opened this issue Aug 6, 2024 · 0 comments · Fixed by #312
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Aug 6, 2024

🐛 Bug

The PR #237 broke streaming from S3.

To Reproduce

from litdata.streaming import StreamingDataLoader, StreamingDataset, TokensLoader

if __name__ == "__main__":
    train_dataset = StreamingDataset(
        # input_dir="s3://tinyllama-template/slimpajama/train",
        input_dir="/teamspace/s3_connections/tinyllama-template/slimpajama/train",
        item_loader=TokensLoader(block_size=128),
    )
    train_dataloader = StreamingDataLoader(train_dataset, shuffle=True, batch_size=1, num_workers=4)

    for batch in train_dataloader:  # hangs
        print(batch)
        break

The next() call on the iterator hangs.
Only hangs with num_workers > 1.

Expected behavior

Returns data.
Before the commit #237 it works.

Environment

  • PyTorch Version (e.g., 1.0): 2.3
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.11
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@awaelchli awaelchli added bug Something isn't working help wanted Extra attention is needed labels Aug 6, 2024
@awaelchli awaelchli self-assigned this Aug 6, 2024
@awaelchli awaelchli changed the title Streaming from s3 hangs Streaming from s3 hangs when num_workers > 1 Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant