MinYS: Mine Your Symbiont by targeted genome assembly in symbiotic communities Guyomar C, Delage W, Legeai F, Mougel C, Simon JC, Lemaitre C BioRxiv 2019, doi:10.1101/2019.12.13.875021
Note: the scripts are intended to be run on the GenOuest computing cluster, they can easily be adapted to other HPC systems.
Cloning this repository will generate a suitable directory structure :
./data
contains reference genomes, and paths to the datasets on the GenOuest cluster. (see data/README.md)./scripts
contrains the scripts to run the analyses. Some editing may be required to adapt to your environment.- By default, results will be stored in
./results
. Note that this folder already contains the results shown in the paper, in case one wants only to reproduce the figures or tables ithout running all the experiments (see results/README.md).
When not used on the GenOuest cluster, pea aphid sequencing data must be downloaded from SRA and fof files have to be updated with the resulting file paths. All SRA identifiers are given in the file data/sra_identifiers.tsv.
-
MinYS :
conda install -c bioconda minys=1.1
-
Megahit :
conda install -c bioconda megahit=1.1.2
-
Metacompass : Metacompass is not available as a conda environment. Please follow the documentation. The experiments described in the paper were performed using the development version at commit
3d187c64324034b7d579e6b6cfe1b366ad94e7a6
(9/04/2019) -
Quast (assembly evaluation) :
conda install -c bioconda conda=5.0.2
These scripts will submit targetted assembly jobs for: each sample ./results/
folder.
All the MinYS jobs can be run by executing:
./scripts/MinYS/submit_all_minys.sh
The script will:
- read files of files in
data/files_of_files/
, - evaluate whether it is a pool or individual sequencing,
- and submit 4 MinYS jobs (for the 4 reference genomes).
For comparison with other approaches, the output of MinYS was further analyzed. This notably includes the enumeration and comparison of genomic paths extracted from the output gfa file, as described in the paper.
./scripts/MinYS/post_analysis.sh
# run Metacompass
./scripts/Metacompass/submit_all_metacompass.sh
# contig filtering with blast using the different reference genomes.
./scripts/Metacompass/blast_metacompass_contigs.sh
# run Megahit
./scripts/Megahit/submit_all_megahit.sh
# contig filtering with blast using the different reference genomes.
./scripts/Megahit/blast_megahit_all.sh
To evaluate all obtained targetted assemblies, the first step is to run quast on all assembly runs:
./scripts/run_quast.sh
Tables and Figure 2 of the paper can be reproduced using R, as described in tables_and_figures.Rmd
In the paper, we demonstrated the ability of MinYS to recover structural variants coexisting in a metagenomic sample. To do so a synthetic dataset was produced in which simulated reads from a rearranged B. aphidicola genome were added to a real pea aphid re-sequencing sample, simulating the coexistence in a metagenomic dataset of two strains with structural variations (here 20 deletions with size between 300 bp and 20 Kb).
The whole protocol is described in scripts/strain_coexistence.md.