Examples: running Bioboxes of taxonomic profilers and assessing their results with OPAL

The following examples show how to run Bioboxes of taxonomic profilers on different datasets, tracking their runtimes and maximum main memory usages, and automatically assessing their results.

If you have already run a profiler and want to assess its results, you only need to run opal.py and can probably skip these examples.

The assessed taxonomic profilers, in these examples, are:

Taxonomic profiler	Biobox docker image
CommonKmers	stefanjanssen/dockerprofilingtools:commonkmers
FOCUS 0.31 adapted for CAMI	stefanjanssen/dockerprofilingtools:focus
Quikr	stefanjanssen/dockerprofilingtools:quickr
mOTU 1.1	stefanjanssen/dockerprofilingtools:motu
Metaphlan 2.2.0	stefanjanssen/dockerprofilingtools:metaphlan2
Metaphyler 1.25	stefanjanssen/dockerprofilingtools:metaphyler
TIPP 2.0.0	stefanjanssen/dockerprofilingtools:tipp

Comparing taxonomic profilers on the CAMI I high complexity dataset

Download the 5 samples of the CAMI I high complexity dataset from https://data.cami-challenge.org/participate and save all files in the same directory. Your directory should contain:

RH_S001__insert_270.fq.gz
RH_S002__insert_270.fq.gz
RH_S003__insert_270.fq.gz
RH_S004__insert_270.fq.gz
RH_S005__insert_270.fq.gz

Pull the Bioboxes of profilers:

docker pull stefanjanssen/docker_profiling_tools:commonkmers
docker pull stefanjanssen/docker_profiling_tools:focus
docker pull stefanjanssen/docker_profiling_tools:metaphlan2
docker pull stefanjanssen/docker_profiling_tools:metaphyler
docker pull stefanjanssen/docker_profiling_tools:quickr
docker pull stefanjanssen/docker_profiling_tools:tipp
docker pull stefanjanssen/docker_profiling_tools:motu

CommonKmers uses a database that is not stored inside its Biobox. Download it from https://zenodo.org/record/1749272/files/CommonKmersData.tar.gz?download=1 (DOI: http://doi.org/10.5281/zenodo.1749272) and extract the files. Make sure to set the path to the files with option --volume for opal_workflow.py, as shown below.

wget --content-disposition https://zenodo.org/record/1749272/files/CommonKmersData.tar.gz?download=1
tar -xzf CommonKmersData.tar.gz

OPAL's tool to run Bioboxes of profilers, measure their run time and maximum memory usage, and automatically assess their results is opal_workflow.py. To run it, you also need the gold standard file gs_cami_i_hc.profile and the Biobox YAML file biobox_cami_i_hc.yaml.
Run opal_workflow.py as follows, modifying the options to match your system's paths.

python3 ./opal_workflow.py \
stefanjanssen/docker_profiling_tools:commonkmers \
stefanjanssen/docker_profiling_tools:focus \
stefanjanssen/docker_profiling_tools:metaphlan2 \
stefanjanssen/docker_profiling_tools:metaphyler \
stefanjanssen/docker_profiling_tools:quickr \
stefanjanssen/docker_profiling_tools:tipp \
stefanjanssen/docker_profiling_tools:motu \
--labels "CommonKmers, FOCUS, Metaphlan, MetaPhyler, Quikr, TIPP, mOTU" \
--input_dir /path/to/gzipped/fastq/files \
--output_dir /path/to/output_dir \
--yaml /path/to/biobox_cami_i_hc.yaml \
--volume /path/to/CommonKmersData:/exchange/db:ro \
--gold_standard_file data/gs_cami_i_hc.profile \
--plot_abundances \
--desc "1st CAMI Challenge Dataset 3 CAMI high"

The output directory, output_dir in this example, will be created if does not exist. It will contain the predictions of all profilers and OPAL's assessments.

Comparing taxonomic profilers on the CAMI II mouse gut dataset

Download the 64 short-read samples of the CAMI II mouse gut dataset from https://data.cami-challenge.org/participate. The files have the same name, but should be located in different sub-directories of the same root directory:

2017.12.29_11.37.26_sample_0/reads/anonymous_reads.fq.gz
2017.12.29_11.37.26_sample_1/reads/anonymous_reads.fq.gz
2017.12.29_11.37.26_sample_2/reads/anonymous_reads.fq.gz
...
2017.12.29_11.37.26_sample_63/reads/anonymous_reads.fq.gz

To run opal_workflow.py, you also need the gold standard file gs_cami_i_hc.profile and the Biobox YAML file biobox_cami_ii_mg.yaml.
Follow and adapt the other steps given above.

Comparing taxonomic profilers on the Human Microbiome Project Mock Community dataset

Download the FASTQ file of the staggered sample (accession SRX055381) from NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) and compress it using gzip. You should have file:

SRR172903.fastq.gz

To run opal_workflow.py, you also need the gold standard file gs_hmp_mc.profile and the Biobox YAML file biobox_hmp_mc.yaml.
Follow and adapt the other steps given above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXAMPLES.md

EXAMPLES.md

Examples: running Bioboxes of taxonomic profilers and assessing their results with OPAL

Comparing taxonomic profilers on the CAMI I high complexity dataset

Comparing taxonomic profilers on the CAMI II mouse gut dataset

Comparing taxonomic profilers on the Human Microbiome Project Mock Community dataset

Files

EXAMPLES.md

Latest commit

History

EXAMPLES.md

File metadata and controls

Examples: running Bioboxes of taxonomic profilers and assessing their results with OPAL

Comparing taxonomic profilers on the CAMI I high complexity dataset

Comparing taxonomic profilers on the CAMI II mouse gut dataset

Comparing taxonomic profilers on the Human Microbiome Project Mock Community dataset