Skip to content

Commit

Permalink
stream datasets example
Browse files Browse the repository at this point in the history
  • Loading branch information
williamFalcon authored Jul 5, 2024
1 parent da1c817 commit 5b47672
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,21 @@ map(
<details>
<summary> ✅ Stream datasets</summary>
&nbsp;

Most large datasets are stored on the cloud and may not fit on local disks. Streaming enables fast data transfer from remote locations to training machines. With optimized formatting like chunking in litserve, data transfer can be faster than local disk access.

Once you've optimized the dataset with LitData, stream it as follows:
```python
from litdata import StreamingDataset, StreamingDataLoader

dataset = StreamingDataset('s3://my-bucket/my-data', shuffle=True)
dataloader = StreamingDataLoader(dataset, batch_size=64)

for batch in dataloader:
process(batch) # Replace with your data processing logic

```

</details>

<details>
Expand Down

0 comments on commit 5b47672

Please sign in to comment.