Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling empty feature table #11

Open
TimothyStephens opened this issue Jul 24, 2024 · 2 comments
Open

Handling empty feature table #11

TimothyStephens opened this issue Jul 24, 2024 · 2 comments

Comments

@TimothyStephens
Copy link

Hi,

Thank you for your work on whokaryote.

I have encountered a bug when whokaryote is run on very small MAGs without any valid features identified. A bit of an edge case I know.
The error arrises from predict_class.py: line 90
The features DataFrame is empty, which causes a ValueError to be returned by predictions = loaded_rf.predict(features).
ValueError: Found array with 0 sample(s) (shape=(0, 9)) while a minimum of 1 is required.

A work around for this problem is to replace line 90 with the following.

    predictions = []
    if not features.empty:
        predictions = loaded_rf.predict(features)

I believe that it should preserve the normal behavior of whokaryote.

Thanks,
Tim.

@LottePronk
Copy link
Owner

Hi Tim,

Thank you for using Whokaryote and for taking the time to look into this error.

I will look into the solution and implement it when I have time.

Just some things to keep in mind:
If the features dataframe is empty, whokaryote cannot make any predictions. Tiara should still be working though, and you can check the Tiara predictions in the featuretable output file.

I'm always curious about the use cases people are using Whokaryote for, as it may be useful to expand its functionality in the future. If I may ask, for what purpose are you running it on MAGs?

Kind regards,
Lotte

@TimothyStephens
Copy link
Author

Hi Lotte,

Thanks for letting me know about Tiara.

My current use case is as part of a snakemake workflow for assembling MAGs from all domains of life (prokaryotes, eukaryotes, and viruses). Because of the possible range of genome sizes which covers all of these domains (viruses having potentially tiny genomes), some of the MAGs being considered are quite small. Which is how I ran into this error.

Cheers,
Tim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants