You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The strategy to be applied is to run several assemblers in parallel and then compare their results. If the assembly validation shows that the assembly isn't good enough, then we re-run the assembly adjusting the parameters. This trategy is the usual protocol used by bioinformaticians, as suggested by the paper "Ten steps to get started in Genome Assembly and Annotation".
Should we use a de novo or a reference-guided strategy for the genomes' assembly? Since we already have a reference genome of high quality, it seems unreasonable not to take advantage of it. However, it is known that a reference-guided assembly has some disadvantages, as "the resulting assemblies may contain some biases towards the used reference (...)" and as a consequence "(...) more diverged regions may not be reconstructed and missing". (Lischer, H., 2017)
Recommended by the "Ten steps (...) paper" and used by Marcela for assembly of L. fortunei nuclear genome
"Hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies" (Zimin, A., 2013)
Requirements:
Linux system (maybe it works on other UNIX-like systems)
Mammalian genomes (up to 3Gb): 512Gb RAM, 32+ cores, 5Tb disk space
"We have shown that our extended reference-guided de novo assembly approach almost always outperforms the corresponding de novo assembly program even when a reference genome of a closely related species is used. The combination of reference mapping and de novo assembly provides a powerful strategy for genome assembly, as it combines the advantages of both approaches" (Lischer, H., 2017)
What assembler should we choose for the de novo assembly step?
The pipeline supports the following de novo assemblers:
AllPaths-LG (v 51279)
idba (v 1.1.1)
abyss-pe (v 1.5.2)
SOAPdenovo2 (v r240)
According to the reference paper, the best assemblies were obtained using the ALLPATHS-LG assembler: "The overall best assembly can be achieved with our reference-guided de novo assembly pipeline using ALLPATHS-LG. However, one should be aware that this is not an ultimate ranking. Other studies have shown that assemblers may perform quite differently on varying data sets and species" (Lischer, H., 2017)
Requirements:
The reference-guided strategy requires about 50% less RAM than the respective de novo assembly (see citation below). A estimate says one should need around 160 bytes of RAM per genome base, meaning around 256 GB RAM for the golden mussel genome (1,6 Gb). Then, a initial set of 0,5*256= 128 GB RAM could be tested.
"The reference-guided de novo with ALLPATHS-LG needs 16% less RAM than de novo assembly with ALLPATHS-LG. This is even more pronounced if the closer reference A. lyrata is used: only 109 GB memory is needed, which is less than half of the de novo assembly." Lischer, H., 2017
Runtime
"However, the lower memory requirements of the reference-guided de novo assembly approach comes with the cost of run time, which is much longer due to several de novo assembly and alignment steps." Lischer, H., 2017
There isn't a general recommendation, but one can run a MIRA command to estimate the resources required. It's called miramem (find out more at MIRA's manual)
The mapping assembly can be found at topic 6 on the manual
The text was updated successfully, but these errors were encountered:
General information
The strategy to be applied is to run several assemblers in parallel and then compare their results. If the assembly validation shows that the assembly isn't good enough, then we re-run the assembly adjusting the parameters. This trategy is the usual protocol used by bioinformaticians, as suggested by the paper "Ten steps to get started in Genome Assembly and Annotation".
Should we use a de novo or a reference-guided strategy for the genomes' assembly? Since we already have a reference genome of high quality, it seems unreasonable not to take advantage of it. However, it is known that a reference-guided assembly has some disadvantages, as "the resulting assemblies may contain some biases towards the used reference (...)" and as a consequence "(...) more diverged regions may not be reconstructed and missing". (Lischer, H., 2017)
Some good candidate de novo assemblers are:
Some good candidates for reference-guided assemblers are:
Heidi Lischer's pipeline
MIRA
The text was updated successfully, but these errors were encountered: