-
Contiguity metrics: We used the following script to retrieve the reported contiguity metrics for each genome assembly. 'perl N50Stat_v3.pl -i inputgenome.fasta'
-
BUSCO evaluation: We used the script run_all_busco_v512_andclean.sh to create a job array that runs BUSCO for each genome assembly. It requires to edit line 8 and add the directory containing a list of genomes with extension ".fasta". It can also be modified to use compleasm, a modified BUSCO that uses miniprot to retrieve a faster and more accurate completeness assessment.
-
Merqury consensus quality (QV) and k-mer completeness: run_merqury.sh.
-
Contaminant filtering: We filtered contaminant sequences from the genomes assemblies using the output generated by our pipeline (LINK TO GITHUB) as input in the following script filter_bacterial_scaffolds_andhuman.pl.
-
Screening for duplicated scaffolds: We used the script run_all_funannotateclean.sh to create a job array that runs funannotate for each genome assembly.
-
Script to rename and retrieve the length for each scaffold: rename_assembly_scaffold.pl.
Quality_evaluation
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||