Skip to content

Latest commit

 

History

History

Quality_evaluation

Genome quality evaluation

  • Contiguity metrics: We used the following script to retrieve the reported contiguity metrics for each genome assembly. 'perl N50Stat_v3.pl -i inputgenome.fasta'

  • BUSCO evaluation: We used the script run_all_busco_v512_andclean.sh to create a job array that runs BUSCO for each genome assembly. It requires to edit line 8 and add the directory containing a list of genomes with extension ".fasta". It can also be modified to use compleasm, a modified BUSCO that uses miniprot to retrieve a faster and more accurate completeness assessment.

  • Merqury consensus quality (QV) and k-mer completeness: run_merqury.sh.

  • Contaminant filtering: We filtered contaminant sequences from the genomes assemblies using the output generated by our pipeline (LINK TO GITHUB) as input in the following script filter_bacterial_scaffolds_andhuman.pl.

  • Screening for duplicated scaffolds: We used the script run_all_funannotateclean.sh to create a job array that runs funannotate for each genome assembly.

  • Script to rename and retrieve the length for each scaffold: rename_assembly_scaffold.pl.