Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: inconsistent streaming dataloader state (specific to StreamingDataset) #318

Merged

Conversation

bhimrazy
Copy link
Collaborator

@bhimrazy bhimrazy commented Aug 9, 2024

What does this PR do?

Fixes #263 && #316.

  • Resuming dataloader with New Dataset Fails
  • Iterating over the dataloader after loading state with complete one epoch iteration throws error.
  • Iterating over the dataloader after loading state with partial first epoch iteration do not reset after completing the epoch.
  • Throws num workers error when loading state with num_worksers=0

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@bhimrazy bhimrazy self-assigned this Aug 9, 2024
@bhimrazy bhimrazy marked this pull request as draft August 9, 2024 07:05
@bhimrazy bhimrazy changed the title [WIP] : Bugfix/316 streaming dataloader state [WIP] : Bugfix/316: inconsistent streaming dataloader state Aug 9, 2024
src/litdata/streaming/dataset.py Outdated Show resolved Hide resolved
src/litdata/streaming/dataloader.py Show resolved Hide resolved
Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@9791488). Learn more about missing BASE report.

Additional details and impacted files
@@          Coverage Diff          @@
##             main   #318   +/-   ##
=====================================
  Coverage        ?    79%           
=====================================
  Files           ?     34           
  Lines           ?   4988           
  Branches        ?      0           
=====================================
  Hits            ?   3919           
  Misses          ?   1069           
  Partials        ?      0           

src/litdata/streaming/dataloader.py Outdated Show resolved Hide resolved
src/litdata/streaming/dataset.py Outdated Show resolved Hide resolved
@bhimrazy bhimrazy changed the title [WIP] : Bugfix/316: inconsistent streaming dataloader state Bugfix: inconsistent streaming dataloader state Aug 11, 2024
@bhimrazy bhimrazy marked this pull request as ready for review August 12, 2024 17:50
@bhimrazy bhimrazy changed the title Bugfix: inconsistent streaming dataloader state Bugfix: inconsistent streaming dataloader state (specific to StreamingDataset) Aug 14, 2024
Copy link
Collaborator

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good ;) Thanks.

@tchaton tchaton merged commit 4dfd98c into Lightning-AI:main Aug 14, 2024
28 checks passed
@bhimrazy bhimrazy deleted the bugfix/316-streaming-dataloader-state branch August 14, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resuming Training with New Dataset Fails
2 participants