Skip to content

Latest commit

 

History

History
101 lines (76 loc) · 4.85 KB

EXAMPLES.md

File metadata and controls

101 lines (76 loc) · 4.85 KB

Examples: running Bioboxes of taxonomic profilers and assessing their results with OPAL

The following examples show how to run Bioboxes of taxonomic profilers on different datasets, tracking their runtimes and maximum main memory usages, and automatically assessing their results.

If you have already run a profiler and want to assess its results, you only need to run opal.py and can probably skip these examples.

The assessed taxonomic profilers, in these examples, are:

Taxonomic profiler Biobox docker image
CommonKmers stefanjanssen/dockerprofilingtools:commonkmers
FOCUS 0.31 adapted for CAMI stefanjanssen/dockerprofilingtools:focus
Quikr stefanjanssen/dockerprofilingtools:quickr
mOTU 1.1 stefanjanssen/dockerprofilingtools:motu
Metaphlan 2.2.0 stefanjanssen/dockerprofilingtools:metaphlan2
Metaphyler 1.25 stefanjanssen/dockerprofilingtools:metaphyler
TIPP 2.0.0 stefanjanssen/dockerprofilingtools:tipp

Comparing taxonomic profilers on the CAMI I high complexity dataset

RH_S001__insert_270.fq.gz
RH_S002__insert_270.fq.gz
RH_S003__insert_270.fq.gz
RH_S004__insert_270.fq.gz
RH_S005__insert_270.fq.gz
  • Pull the Bioboxes of profilers:
docker pull stefanjanssen/docker_profiling_tools:commonkmers
docker pull stefanjanssen/docker_profiling_tools:focus
docker pull stefanjanssen/docker_profiling_tools:metaphlan2
docker pull stefanjanssen/docker_profiling_tools:metaphyler
docker pull stefanjanssen/docker_profiling_tools:quickr
docker pull stefanjanssen/docker_profiling_tools:tipp
docker pull stefanjanssen/docker_profiling_tools:motu
wget --content-disposition https://zenodo.org/record/1749272/files/CommonKmersData.tar.gz?download=1
tar -xzf CommonKmersData.tar.gz
  • OPAL's tool to run Bioboxes of profilers, measure their run time and maximum memory usage, and automatically assess their results is opal_workflow.py. To run it, you also need the gold standard file gs_cami_i_hc.profile and the Biobox YAML file biobox_cami_i_hc.yaml.

  • Run opal_workflow.py as follows, modifying the options to match your system's paths.

python3 ./opal_workflow.py \
stefanjanssen/docker_profiling_tools:commonkmers \
stefanjanssen/docker_profiling_tools:focus \
stefanjanssen/docker_profiling_tools:metaphlan2 \
stefanjanssen/docker_profiling_tools:metaphyler \
stefanjanssen/docker_profiling_tools:quickr \
stefanjanssen/docker_profiling_tools:tipp \
stefanjanssen/docker_profiling_tools:motu \
--labels "CommonKmers, FOCUS, Metaphlan, MetaPhyler, Quikr, TIPP, mOTU" \
--input_dir /path/to/gzipped/fastq/files \
--output_dir /path/to/output_dir \
--yaml /path/to/biobox_cami_i_hc.yaml \
--volume /path/to/CommonKmersData:/exchange/db:ro \
--gold_standard_file data/gs_cami_i_hc.profile \
--plot_abundances \
--desc "1st CAMI Challenge Dataset 3 CAMI high"

The output directory, output_dir in this example, will be created if does not exist. It will contain the predictions of all profilers and OPAL's assessments.

Comparing taxonomic profilers on the CAMI II mouse gut dataset

  • Download the 64 short-read samples of the CAMI II mouse gut dataset from https://data.cami-challenge.org/participate. The files have the same name, but should be located in different sub-directories of the same root directory:
2017.12.29_11.37.26_sample_0/reads/anonymous_reads.fq.gz
2017.12.29_11.37.26_sample_1/reads/anonymous_reads.fq.gz
2017.12.29_11.37.26_sample_2/reads/anonymous_reads.fq.gz
...
2017.12.29_11.37.26_sample_63/reads/anonymous_reads.fq.gz

Comparing taxonomic profilers on the Human Microbiome Project Mock Community dataset

  • Download the FASTQ file of the staggered sample (accession SRX055381) from NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) and compress it using gzip. You should have file:
SRR172903.fastq.gz