Releases: Lightning-AI/litdata
Release v0.2.8
What's Changed
- Update README.md by @tchaton in #143
- Performance improvement for processing by @sritterginkgo in #146
- Fix: Resolve drop_last not passed down from the StreamingDataLoader to the datasets by @tchaton in #147
- Bump pytest from 8.2.0 to 8.2.1 by @dependabot in #148
- LitData release version bump 0.2.8 by @tchaton in #153
New Contributors
- @sritterginkgo made their first contribution in #146
Full Changelog: v0.2.7...v0.2.8
Release v0.2.7
What's Changed
- Fix the NoHeaderTensorSerializer for 1D tensors (other than tokens) by @enrico-stauss in #124
- Fix infinite sleep when loading local compressed dataset. by @wzf03 in #127
- Fix configuration of a custom serializers for one of the predefined types by @enrico-stauss in #125
- Add dist env detection via env vars by @gkroiz in #95
- Fix empty tensor deserialization by @enrico-stauss in #131
- Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.1 by @dependabot in #132
- Prevent race deletion by @tchaton in #136
- Add support for exact iteration by @tchaton in #139
- Bump LitData version 0.2.7 by @tchaton in #142
New Contributors
- @enrico-stauss made their first contribution in #124
- @wzf03 made their first contribution in #127
- @gkroiz made their first contribution in #95
Full Changelog: v0.2.6...v0.2.7
Release 0.2.6
What's Changed
- Bump pytest from 8.0.2 to 8.2.0 by @dependabot in #115
- Bump coverage from 7.4.4 to 7.5.0 by @dependabot in #117
- Bump pytest-cov from 4.1.0 to 5.0.0 by @dependabot in #116
- Resolve some bugs by @tchaton in #121
- Add support for
iterate_over_all
for the CombinedDataset by @tchaton in #122 - Update version 0.2.6 by @tchaton in #123
Full Changelog: v0.2.5...v0.2.6
Release 0.2.5
What's Changed
Full Changelog: v0.2.4...v0.2.5
Release 0.2.4
What's Changed
- Update LitGPT references in README.md by @rasbt in #90
- Don't raise a runtimeError if the downloader doesn't exist. by @tchaton in #98
- Added call to setup function of serializer class to set data format by @vgurev in #96
- Fix
map()
failing to create dataset wheninput_dir
is None by @awaelchli in #100 - Streamingdataset torch compatibility by @yhl48 in #108
- Move to version 0.2.4 by @tchaton in #109
New Contributors
- @rasbt made their first contribution in #90
- @vgurev made their first contribution in #96
- @awaelchli made their first contribution in #100
- @yhl48 made their first contribution in #108
Full Changelog: v0.2.3...v0.2.4
Release 0.2.3
Full Changelog: v0.2.2...v0.2.3
Release 0.2.2
Couple of tiny fixes.
Release 0.2.1
Release 0.2.1. Minor fixes.
Release 0.2.0
⚡ Welcome to Lightning Data
We developed StreamingDataset
to optimize training of large datasets stored on the cloud while prioritizing speed, affordability, and scalability.
Specifically crafted for multi-gpu & multi-node (with DDP, FSDP, etc...), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data's location. Simply stream in the required data when needed.
The StreamingDataset
is compatible with any data type, including images, text, video, audio, geo-spatial, and multimodal data and it is a drop-in replacement for your PyTorch IterableDataset class. For example, it is used by Lit-GPT to pretrain LLMs.
This release marks the first of the release from litdata. From now on, we will track all changes within a CHANGELOG.md file.
Thanks to all contributors.