Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up input BCF validation #43

Open
fedarko opened this issue Aug 25, 2022 · 0 comments
Open

Speed up input BCF validation #43

fedarko opened this issue Aug 25, 2022 · 0 comments
Labels
performance gotta go fast

Comments

@fedarko
Copy link
Owner

fedarko commented Aug 25, 2022

Adding the "is bcf simple" checks made the process of verifying the BCF take ~15.32 seconds (edit: ok, around 12-16 seconds, maybe) on Bloom, as opposed to ~0.24 seconds from a few days ago (before I added all those checks).

In my view, this tradeoff is 100% worth it -- better slow and correct than fast and wrong. But it'd be nice to speed things up.

Some ideas:

  1. If the input BCF was produced by strainFlye, just take a leap of faith and assume it's OK (only use strict validation on outside inputs)
  2. Depending on how many contigs there are in the dataset, parallelize the checks across contigs

... and there are probs other ideas that would also work.

@fedarko fedarko added the performance gotta go fast label Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance gotta go fast
Projects
None yet
Development

No branches or pull requests

1 participant