-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resuming StreamingDataloader with num_workers=0 fails #24
Comments
Following up on this, what is the recommended practice/solution? I was able to load my checkpoint and manually adjust it from num_workers=0 to 1 and save it again, to get it to pass the check when it loads the state_dict but wanted to know if there's a better work around |
Hi, I've also encountered this issue but with non-zero any ideas/updates on what's going on here?
|
Hey @ukasschmit, LitData doesn't support changing the number of workers when resuming. Can you restart with 16 workers? |
Bug description
Using a StreamingDataloader with
num_workers=0
works, but resuming the state does not. There is an explicit length check for the state that fails.Using
num_workers=0
is maybe not very meaningful for real applications, but it might be good for debugging and testing purposes. Alternatively, if that's difficult to support, then StreamingDataloader could just force havingnum_workers>=1
. I think we should do something about it, since 0 is the default for the dataloader and users might forget to set it and then run into this error which could be confusing them.What version are you seeing the problem on?
master
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
No response
Moved from Lightning-AI/pytorch-lightning#19335, submitted by @awaelchli
The text was updated successfully, but these errors were encountered: