Suggestion : Run checkm on more than 1 MAG at a time #23

michoug · 2020-02-06T13:47:23Z

Hi,
I have a few hundred MAGs that I want to taxonomically check and I realized that the checkm software is run MAG per MAG as are all the other steps. However, the pplacer step in checkm takes quite some time as it needs to put the tree in memory to place the MAG each time. When running it by itself, I think that it's much faster to do it with all the MAGS at the same time.
Using the default 8 cores, the checkm software is using only 2 cores at a time and the other 6 are not used as soon as prodigal and mash are finished
Would it be possible to implement this in the pipeline or maybe doing 20-30 at once?
Best
Greg

abremges · 2020-02-12T08:44:14Z

Awesome! While on our infrastructure this never appeared to be a bottleneck, I can totally understand your reasoning and imagine that this will indeed speed up the CheckM classification step. Feel free to submit a pull request and I (or @AlphaSquad) will merge it into the master branch after confirming that everything works as intended. Thank you!

michoug changed the title ~~Suggestion : Run checkm on all bins at once~~ Suggestion : Run checkm on more than 1 MAG at a time Feb 6, 2020

abremges added enhancement New feature or request good first issue Good for newcomers labels Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion : Run checkm on more than 1 MAG at a time #23

Suggestion : Run checkm on more than 1 MAG at a time #23

michoug commented Feb 6, 2020 •

edited

Loading

abremges commented Feb 12, 2020

Suggestion : Run checkm on more than 1 MAG at a time #23

Suggestion : Run checkm on more than 1 MAG at a time #23

Comments

michoug commented Feb 6, 2020 • edited Loading

abremges commented Feb 12, 2020

michoug commented Feb 6, 2020 •

edited

Loading