Name		Name	Last commit message	Last commit date
parent directory ..
N50Stat_v3.pl		N50Stat_v3.pl
README.md		README.md
filter_bacterial_scaffolds_andhuman.pl		filter_bacterial_scaffolds_andhuman.pl
rename_assembly_scaffold.pl		rename_assembly_scaffold.pl
rename_assembly_scaffold_all.sh		rename_assembly_scaffold_all.sh
run_all_busco_v512_andclean.sh		run_all_busco_v512_andclean.sh
run_all_funannotateclean.sh		run_all_funannotateclean.sh
run_merqury.sh		run_merqury.sh

README.md

Genome quality evaluation

Contiguity metrics: We used the following script to retrieve the reported contiguity metrics for each genome assembly. 'perl N50Stat_v3.pl -i inputgenome.fasta'
BUSCO evaluation: We used the script run_all_busco_v512_andclean.sh to create a job array that runs BUSCO for each genome assembly. It requires to edit line 8 and add the directory containing a list of genomes with extension ".fasta". It can also be modified to use compleasm, a modified BUSCO that uses miniprot to retrieve a faster and more accurate completeness assessment.
Merqury consensus quality (QV) and k-mer completeness: run_merqury.sh.
Contaminant filtering: We filtered contaminant sequences from the genomes assemblies using the output generated by our pipeline (LINK TO GITHUB) as input in the following script filter_bacterial_scaffolds_andhuman.pl.
Screening for duplicated scaffolds: We used the script run_all_funannotateclean.sh to create a job array that runs funannotate for each genome assembly.
Script to rename and retrieve the length for each scaffold: rename_assembly_scaffold.pl.