-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop_last is not respected #442
Comments
This actually appears to be related to the num_workers, when I change it from 24 to 4:
No issue now |
@robmarkcole, I have a quick question as I’m trying to understand the issue better. |
@bhimrazy yes it should be dropped due to |
@robmarkcole Could you write a simple reproducible example using only integer for data ? |
Or even gives us the dataset length and the batch_size and num workers causing the issue |
I have a small dataset:
batch_size: 2With
I log batch_size: 4I have
and no errors. I log Instead with
I log batch_size: 8When I increase both to
I then get the error Now if I set
I am logging the batch size and get Note: with Test index.json{
"chunks": [
{
"chunk_bytes": 19341730,
"chunk_size": 6,
"dim": null,
"filename": "chunk-0-0.bin"
},
{
"chunk_bytes": 20316676,
"chunk_size": 6,
"dim": null,
"filename": "chunk-1-0.bin"
},
{
"chunk_bytes": 20219153,
"chunk_size": 6,
"dim": null,
"filename": "chunk-2-0.bin"
},
{
"chunk_bytes": 20284337,
"chunk_size": 6,
"dim": null,
"filename": "chunk-3-0.bin"
}
],
"config": {
"chunk_bytes": 128000000,
"chunk_size": null,
"compression": null,
"data_format": [
"str",
"tifffile",
"tifffile"
],
"data_spec": "[1, {\"type\": \"builtins.dict\", \"context\": \"[\\\"image_id\\\", \\\"mask\\\", \\\"image\\\"]\", \"children_spec\": [{\"type\": null, \"context\": null, \"children_spec\": []}, {\"type\": null, \"context\": null, \"children_spec\": []}, {\"type\": null, \"context\": null, \"children_spec\": []}]}]",
"encryption": null,
"item_loader": "PyTreeLoader"
},
"updated_at": "1735919317.2183607"
} |
Hey @robmarkcole Yes, you are right. After double checking the logic for chunk association, it turns out to be wrong. But this would take some time to fix it. |
@robmarkcole Can you try this PR and let me know if this fixes it: https://github.com/Lightning-AI/litdata/pull/449/files ? |
🐛 Bug
I pass
Configure batch_size=2 and log the actual batch sizes received, the final has a size of 1.
To Reproduce
Steps to reproduce the behavior...
Code sample
Expected behavior
Additional context
litdata==0.2.34
The text was updated successfully, but these errors were encountered: