Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_test_split doesn't support split of 0.0 #182

Closed
robmarkcole opened this issue Jun 25, 2024 · 4 comments · Fixed by #187
Closed

train_test_split doesn't support split of 0.0 #182

robmarkcole opened this issue Jun 25, 2024 · 4 comments · Fixed by #187
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@robmarkcole
Copy link
Contributor

🐛 Bug

The check 0 < _f <= 1 is failed for value of 0.0

To Reproduce

Pass a value of 0.0 as a split to train_test_split

train_test_split(dataset, splits=[0.0, 0.0, 1.0])

Expected behavior

I can have a split of 0.0

Environment

Master

@robmarkcole robmarkcole added bug Something isn't working help wanted Extra attention is needed labels Jun 25, 2024
@deependujha
Copy link
Collaborator

deependujha commented Jun 26, 2024

Hi @robmarkcole, thanks for pointing out the issue.

I'm working on another issue. When the PR is ready to be merged, I'll try to fix this issue too. I don't think fixing this will require much work to be done.

@deependujha
Copy link
Collaborator

Btw, why would someone even want a split of 0.0?

This even makes sense: [0.01, 0.01, 0.98] and it works fine.

If I remember correctly, Luca added the condition for each split to be greater than 0, while reviewing the PR.

if not all(0 < _f <= 1 for _f in splits):
        raise ValueError("Each Split should be a float with each value in [0,1].")

@robmarkcole
Copy link
Contributor Author

I have a single dataset and typically random split it. However I also sometimes want to just test on it, so test weighting is 100%

@deependujha
Copy link
Collaborator

Okay, I was thinking of just updating all(0 **<=** _f <= 1 for _f in splits) will do the work, but, I also need to make some changes internally.

I'll try fixing it as soon as possible. Btw, if you've used train_test_split and any issues encountered, plz mention it here in the same thread. It'll be easier to club them and fix them at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
2 participants