We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splits=[0.1, 0.2, 0.7]
train_test_split works perfectly when asked to split dataset in splits=[0.1, 0.7, 0.2], but it fails when asked for splits=[0.1, 0.2, 0.7].
train_test_split
splits=[0.1, 0.7, 0.2]
Try this script:
import os from litdata import optimize, train_test_split, StreamingDataset, StreamingDataLoader x, y, z = train_test_split(streaming_dataset=StreamingDataset("output_dir"), splits=[0.1, 0.2, 0.7]) print(f"{len(x)=}") print(f"{len(y)=}") print(f"{len(z)=}") print(f"{x[:]=}") print(f"{y[:]=}") print(f"{z[:]=}") # this will raise error x = StreamingDataLoader(x, batch_size=5) y = StreamingDataLoader(y, batch_size=5) z = StreamingDataLoader(z, batch_size=5) print("-"*80) print("iterate X") for _x in x: print(_x) print("-"*80) print("iterate Y") for _y in y: print(_y) print("-"*80) print("iterate Z") for _z in z: # this will raise error print(_z) print("-"*80) print("All done!")
Code for output_dir:
import os from litdata import optimize, train_test_split, StreamingDataset def compress(index): return (index, index ** 2) optimize( fn=compress, inputs=list(range(100)), num_workers=4, output_dir="output_dir", chunk_bytes="64MB", mode="overwrite", )
It should work irrespective of their order.
conda
pip
pip install -e .
It's happening bcoz of some logic issue in def subsample_filenames_and_roi().
def subsample_filenames_and_roi()
The text was updated successfully, but these errors were encountered:
Hi! thanks for your contribution!, great first issue!
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
🐛 Bug
train_test_split
works perfectly when asked to split dataset insplits=[0.1, 0.7, 0.2]
, but it fails when asked forsplits=[0.1, 0.2, 0.7]
.To Reproduce
Try this script:
Code sample
Code for output_dir:
Expected behavior
It should work irrespective of their order.
Environment
conda
,pip
, source): Already installed on Lightning Studiopip install -e .
Additional context
It's happening bcoz of some logic issue in
def subsample_filenames_and_roi()
.The text was updated successfully, but these errors were encountered: