Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: Parallelize input parsing #10

Open
anergictcell opened this issue Sep 22, 2022 · 0 comments
Open

Performance improvement: Parallelize input parsing #10

anergictcell opened this issue Sep 22, 2022 · 0 comments

Comments

@anergictcell
Copy link
Owner

Uncouple the input parsing code from IO and run the parsing in multiple concurrent threads.
This is relatively easy for GenePred or RefGene input, but more difficult for GTF files. GTF parsing requires a shared Hashmap because there is no defined order of the individual rows in the input.

Also another issue is to determine how many parallel threads to allow. Ideally, the number of threads should be specified by the caller and not be determined automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant