From 5b47672fec0671bb10b072afd7a32c2f821b3655 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Fri, 5 Jul 2024 16:46:32 -0400 Subject: [PATCH] stream datasets example --- README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/README.md b/README.md index 709825ec..9229cdcd 100644 --- a/README.md +++ b/README.md @@ -226,6 +226,21 @@ map(
✅ Stream datasets   + +Most large datasets are stored on the cloud and may not fit on local disks. Streaming enables fast data transfer from remote locations to training machines. With optimized formatting like chunking in litserve, data transfer can be faster than local disk access. + +Once you've optimized the dataset with LitData, stream it as follows: +```python +from litdata import StreamingDataset, StreamingDataLoader + +dataset = StreamingDataset('s3://my-bucket/my-data', shuffle=True) +dataloader = StreamingDataLoader(dataset, batch_size=64) + +for batch in dataloader: + process(batch) # Replace with your data processing logic + +``` +