Performance Optimization with expected wait times #1953
Replies: 1 comment
-
How long loading takes varies wildly depending on what hardware you are running on, how many cores you have available, and how big your data is. Our largest loading project of almost 1000 genomes typically takes us 3 days to load on pretty robust hardware. Our smaller exome projects usually take a couple hours. @bw2 might have some more detailed ideas about how long things run with some of the recommended configurations in the local install readme, but theres so much variation its really hard to say anything with certainty. There is documented way to split the loading in to 2 steps if that is what you meant by breaking up long running jobs, see the section immediately above this link: https://github.com/broadinstitute/seqr/blob/master/deploy/LOCAL_INSTALL.md#adding-a-loaded-dataset-to-a-seqr-project |
Beta Was this translation helpful? Give feedback.
-
I am not sure how long steps should take for the on-prem uploading on a server. Currently, after upgrading the latest pipeline runner container I am getting a lot of output on stdout and am concerned about the bounds of how long it should take to upload a trio vcf and options for optimizing performance?
Describe the solution you'd like
I would like some info on the performance bounds/expectation of time to upload for a typical WGS and WES sequencing project in the documentation. Also suggestions if this becomes a long running job (e.g. overnight)? Should hail conversion and uploads take overnight? I would like to break up long running jobs for vcf to hail for on-prem conversion and ways to control cost.
Beta Was this translation helpful? Give feedback.
All reactions