Assistance needed for running 10,500 Salmonella genomes through fastANI without running into a time out error. #142

brennenohunt · 2024-12-04T15:37:19Z

I am attempting to run ~10,500 Salmonella genomes on the Texas A&M HPRC, but running into an issue where the job timed out at 21 days. Per the outfile, only about 2,300 genomes were run through fastANI taking on average between 700-800 seconds each. Another student ran 8,000 Bacteroides genomes on the same partition on the same cluster and was able to complete the run in nine days where each genome took on average between 200-300 seconds to run. I was wondering if anyone had any recommendations for how to run my 10,500 genomes to completion, preferably in less than 21 days. The only difference between my script and the other script is that I allotted 200gb of memory and she allotted 350gb. Any help would be much appreciated. I can be reached at [email protected]. Thank you!

cjain7 · 2024-12-04T16:41:48Z

FastANI's runtime depends on several factors including how similar the given set of genomes are to each other. For example, comparison of distant genomes is done much faster (because there are fewer k-mer matches to process).

Does this help?
https://github.com/ParBLiSS/FastANI?tab=readme-ov-file#parallelization

brennenohunt · 2024-12-04T16:54:55Z

I will be reviewing this with my professor. If we have any other issues, I will be sure to leave another comment. Thank you!

brennenohunt · 2024-12-04T20:10:31Z

We have a question regarding the script provided for the parallelization. We are concerned that once the databases are split, they are being run independently and we want to be able to compare across all databases. Is there a way to combine them? Thanks!

cjain7 · 2024-12-10T12:46:52Z

The output won't be affected; you can try once with a smaller set (e.g., 100) genomes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistance needed for running 10,500 Salmonella genomes through fastANI without running into a time out error. #142

Assistance needed for running 10,500 Salmonella genomes through fastANI without running into a time out error. #142

brennenohunt commented Dec 4, 2024

cjain7 commented Dec 4, 2024

brennenohunt commented Dec 4, 2024

brennenohunt commented Dec 4, 2024

cjain7 commented Dec 10, 2024

Assistance needed for running 10,500 Salmonella genomes through fastANI without running into a time out error. #142

Assistance needed for running 10,500 Salmonella genomes through fastANI without running into a time out error. #142

Comments

brennenohunt commented Dec 4, 2024

cjain7 commented Dec 4, 2024

brennenohunt commented Dec 4, 2024

brennenohunt commented Dec 4, 2024

cjain7 commented Dec 10, 2024