Sylvain Schmitt April 20, 2021
singularity
&
snakemake
workflow to detect mutations with several alignment and mutation
detection tools.
- Python ≥3.5
- Snakemake ≥5.24.1
- Golang ≥1.15.2
- Singularity ≥3.7.3
- This workflow
# Python
sudo apt-get install python3.5
# Snakemake
sudo apt install snakemake`
# Golang
export VERSION=1.15.8 OS=linux ARCH=amd64 # change this as you need
wget -O /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz https://dl.google.com/go/go${VERSION}.${OS}-${ARCH}.tar.gz && \
sudo tar -C /usr/local -xzf /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz
echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
source ~/.bashrc
# Singularity
mkdir -p ${GOPATH}/src/github.com/sylabs && \
cd ${GOPATH}/src/github.com/sylabs && \
git clone https://github.com/sylabs/singularity.git && \
cd singularity
git checkout v3.7.3
cd ${GOPATH}/src/github.com/sylabs/singularity && \
./mconfig && \
cd ./builddir && \
make && \
sudo make install
# detect Mutations
git clone [email protected]:sylvainschmitt/detectMutations.git
cd detectMutations
Generate data using the generate Mutations workflow.
git clone [email protected]:sylvainschmitt/generateMutations.git
cd ../generateMutations
snakemake --use-singularity --cores 4
cd ../detectMutations
bash scripts/get_data.sh
snakemake -np # dry run
snakemake --dag | dot -Tsvg > dag/dag.svg # dag
snakemake --use-singularity --cores 4 # run
snakemake --use-singularity --cores 1 --verbose # debug
snakemake --report report.html # report
module purge ; module load bioinfo/snakemake-5.25.0 # for test on node
snakemake -np # dry run
sbatch job.sh ; watch 'squeue -u sschmitt' # run
less detMut.*.err # snakemake outputs, use MAJ+F
less detMut.*.out # snakemake outputs, use MAJ+F
snakemake --dag | dot -Tsvg > dag/dag.svg # dag
module purge ; module load bioinfo/snakemake-5.8.1 ; module load system/Python-3.6.3 # for report
snakemake --report report.html # report
module purge ; module load system/R-3.6.2 ; R # to build results
Index reference and SNPs for software to work with.
- Tools:
BWA index
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bwa/bwa:latest
- Tools:
samtools faidx
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
gatk CreateSequenceDictionary
- Singularity: docker://broadinstitute/gatk:4.2.6.1
- Tools:
gatk IndexFeatureFile
- Singularity: docker://broadinstitute/gatk:4.2.6.1
CReport quality and trim.
- Tools:
fastQC
- Singularity: docker://biocontainers/fastqc:v0.11.9_cv8
- Tools:
Trimmomatic
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/trimmomatic/trimmomatic:latest
Align reads against reference, mark duplicated, and report alignment quality.
- Tools:
BWA mem
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bwa/bwa:latest
- Tools:
Samtools sort
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
Samtools index
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
gatk MarkDuplicates
- Singularity: docker://broadinstitute/gatk:4.2.6.1
- Tools:
Samtools index
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
Samtools mpileup
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
Samtools stats
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest
- Tools:
QualiMap
- Singularity: docker://pegi3s/qualimap:2.2.1
Detect mutations.
- Tools:
gatk Mutect2
- Singularity: docker://broadinstitute/gatk:4.2.6.1
- Tools:
freebayes
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/freebayes/freebayes:latest
- Tools:
gatk HaplotypeCaller
- Singularity: docker://broadinstitute/gatk:4.2.6.1
- Tools:
gatk GenotypeGVCFs
- Singularity: docker://broadinstitute/gatk:4.2.6.1
- Tools:
Strelka2
- Singularity: docker://quay.io/wtsicgp/strelka2-manta
- Tools:
VarScan
- Singularity: docker://alexcoppe/varscan
- Script:
varscan2vcf.R
- Singularity: https://github.com/sylvainschmitt/singularity-template/releases/download/0.0.1/sylvainschmitt-singularity-tidyverse-Biostrings.latest.sif
- Tools:
Somatic Sniper
- Singularity: docker://lethalfang/somaticsniper:1.0.5.0
- Tools:
MuSe
- Singularity: docker://opengenomics/muse:v0.1.1
- Tools:
cp
- Tools:
bedtools substract
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bedtools/bedtools:latest
Combined quality information from QualiMap
, Picard
, Samtools
,
Trimmomatic
, and FastQC
(see previous steps) and assess calls
performance.
- Tools:
MultiQC
- Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/multiqc/multiqc:latest
- Script:
evaluate_call.R
- Singularity: https://github.com/sylvainschmitt/singularity-template/releases/download/0.0.1/sylvainschmitt-singularity-tidyverse-Biostrings.latest.sif
module load system/singularity-3.7.3
singularity pull https://github.com/sylvainschmitt/singularity-r-bioinfo/releases/download/0.0.3/sylvainschmitt-singularity-r-bioinfo.latest.sif
singularity shell sylvainschmitt-singularity-r-bioinfo.latest.sif
library(tidyverse)
lapply(list.files("results/stats", full=T), read_tsv) %>%
bind_rows() %>%
write_tsv("stats.tsv")
quit()
exit