A Python package for working with Bactopia
There are many subcommands available in Bactopia. Here is a brief description of each command:
Command | Description |
---|---|
bactopia-citations |
Print out tools and citations used throughout Bactopia |
bactopia-datasets |
Download optional datasets to supplement your analyses with Bactopia |
bactopia-download |
Builds Bactopia environments for use with Nextflow. |
bactopia-prepare |
Create a 'file of filenames' (FOFN) of samples to be processed by Bactopia |
bactopia-search |
Query against ENA and SRA for public accessions to process with Bactopia |
bactopia-summary |
Generate a summary table from the Bactopia results. |
Below is the --help
output for each subcommand.
Usage: bactopia-citations [OPTIONS]
Print out tools and citations used throughout Bactopia
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --version -V Show the version and exit. โ
โ * --bactopia-path -b TEXT Directory where Bactopia repository is stored โ
โ [required] โ
โ --name -n TEXT Only print citation matching a given name โ
โ --plain-text -p Disable rich formatting โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Usage: bactopia-datasets [OPTIONS] [UNKNOWN]...
Download optional datasets to supplement your analyses with Bactopia
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --bactopia-path TEXT Directory where Bactopia repository is stored [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Download Related Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --datasets_cache TEXT Base directory to download datasets to (Defaults to env โ
โ variable BACTOPIA_CACHEDIR, a subfolder called datasets โ
โ will be created) โ
โ [default: ${HOME}/.bactopia] โ
โ --force Force overwrite of existing pre-built environments. โ
โ --max_retry INTEGER Maximum times to attempt creating Conda environment. โ
โ (Default: 3) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --verbose Print debug related text. โ
โ --silent Only critical errors will be printed. โ
โ --version Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Usage: bactopia-download [OPTIONS] [UNKNOWN]...
Builds Bactopia environments for use with Nextflow.
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --bactopia-path TEXT Directory where Bactopia results are stored [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Build Related Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --envtype [conda|docker|singularity| The type of environment to โ
โ all] build. โ
โ [default: conda] โ
โ --wf TEXT Build a environment for a โ
โ the given workflow โ
โ [default: bactopia] โ
โ --condadir TEXT Directory to create Conda โ
โ environments โ
โ (NXF_CONDA_CACHEDIR env โ
โ variable takes precedence) โ
โ --use_conda Use Conda for building โ
โ Conda environments instead โ
โ of Mamba โ
โ --singularity_cache TEXT Directory to download โ
โ Singularity images โ
โ (NXF_SINGULARITY_CACHEDIR โ
โ env variable takes โ
โ precedence) โ
โ --singularity_pull_dockerโฆ Force conversion of Docker โ
โ containers, instead โ
โ downloading Singularity โ
โ images directly โ
โ --force_rebuild Force overwrite of โ
โ existing pre-built โ
โ environments. โ
โ --max_retry INTEGER Maximum times to attempt โ
โ creating Conda โ
โ environment. (Default: 3) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --verbose Print debug related text. โ
โ --silent Only critical errors will be printed. โ
โ --version Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --build-all Builds all environments for Bactopia workflows โ
โ --build-nfcore Builds all nf-core related environments โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Usage: bactopia-prepare [OPTIONS]
Create a 'file of filenames' (FOFN) of samples to be processed by Bactopia
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --path -p TEXT Directory where FASTQ files are stored [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Matching Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --assembly-ext -a TEXT Extension of the FASTA assemblies [default: .fna.gz] โ
โ --fastq-ext -f TEXT Extension of the FASTQs [default: .fastq.gz] โ
โ --fastq-separator TEXT Split FASTQ name on the last occurrence of the โ
โ separator โ
โ [default: _] โ
โ --pe1-pattern TEXT Designates difference first set of paired-end reads โ
โ [default: [Aa]|[Rr]1|1] โ
โ --pe2-pattern TEXT Designates difference second set of paired-end reads โ
โ [default: [Bb]|[Rr]2|2] โ
โ --merge Flag samples with multiple read sets to be merged by โ
โ Bactopia โ
โ --ont Single-end reads should be treated as Oxford Nanopore โ
โ reads โ
โ --recursive -r Directories will be traversed recursively โ
โ --prefix TEXT Prefix to add to the path โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Sample Information Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --metadata TEXT Metadata per sample with genome size and species โ
โ information โ
โ --genome-size -gsize INTEGER Genome size to use for all samples โ
โ --species -s TEXT Species to use for all samples (If available, can be โ
โ used to determine genome size) โ
โ --taxid TEXT Use the genome size of the Taxon ID for all samples โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --examples Print example usage โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version -V Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Usage: bactopia-search [OPTIONS]
Query against ENA and SRA for public accessions to process with Bactopia
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --query -q TEXT Taxon ID or Study, BioSample, or Run accession (can also be โ
โ comma separated or a file of accessions) โ
โ [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Query Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --exact-taxon Exclude Taxon ID descendants โ
โ --limit -l INTEGER Maximum number of results (per query) to return โ
โ [default: 1000000] โ
โ --accession-limit -al INTEGER Maximum number of accessions to query at once โ
โ [default: 5000] โ
โ --biosample-subset INTEGER If a BioSample has multiple Experiments, maximum โ
โ number to randomly select (0 = disabled) โ
โ [default: 0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Filtering Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --min-base-count -mbc INTEGER Filters samples based on minimum base pair count โ
โ (0 = disabled) โ
โ [default: 0] โ
โ --min-read-length -mrl INTEGER Filters samples based on minimum mean read length โ
โ (0 = disabled) โ
โ [default: 0] โ
โ --min-coverage -mc INTEGER Filter samples based on minimum coverage (requires โ
โ --genome_size, 0 = disabled) โ
โ [default: 0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --genome-size -gsize INTEGER Genome size to be used for all samples, and for โ
โ calculating min coverage โ
โ [default: 0] โ
โ --outdir -o TEXT Directory to write output [default: ./] โ
โ --prefix -p TEXT Prefix to use for output file names โ
โ [default: bactopia] โ
โ --force Overwrite existing reports โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version -V Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Usage: bactopia-summary [OPTIONS]
Generate a summary table from the Bactopia results.
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --bactopia-path -b TEXT Directory where Bactopia results are stored [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Gold Cutoffs โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --gold-coverage -gcov INTEGER Minimum amount of coverage required for Gold โ
โ status โ
โ [default: 100] โ
โ --gold-quality -gqual INTEGER Minimum per-read mean quality score required โ
โ for Gold status โ
โ [default: 30] โ
โ --gold-read-length -glen INTEGER Minimum mean read length required for Gold โ
โ status โ
โ [default: 95] โ
โ --gold-contigs -gcontigs INTEGER Maximum contig count required for Gold โ
โ status โ
โ [default: 100] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Silver Cutoffs โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --silver-coverage -scov INTEGER Minimum amount of coverage required for โ
โ Silver status โ
โ [default: 50] โ
โ --silver-quality -squal INTEGER Minimum per-read mean quality score โ
โ required for Silver status โ
โ [default: 20] โ
โ --silver-read-length -slen INTEGER Minimum mean read length required for โ
โ Silver status โ
โ [default: 75] โ
โ --silver-contigs -scontigs INTEGER Maximum contig count required for Silver โ
โ status โ
โ [default: 200] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Fail Cutoffs โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --min-coverage -mincov INTEGER Minimum amount of coverage required to pass โ
โ [default: 20] โ
โ --min-quality -minqual INTEGER Minimum per-read mean quality score โ
โ required to pass โ
โ [default: 12] โ
โ --min-read-length -minlen INTEGER Minimum mean read length required to pass โ
โ [default: 49] โ
โ --max-contigs INTEGER Maximum contig count required to pass โ
โ [default: 500] โ
โ --min-assembled-size INTEGER Minimum assembled genome size โ
โ --max-assembled-size INTEGER Maximum assembled genome size โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --outdir -o PATH Directory to write output [default: ./] โ
โ --prefix -p TEXT Prefix to use for output files [default: bactopia] โ
โ --force Overwrite existing reports โ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version -V Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
The AllTheBacteria is a collection of nearly 2,000,000 bacterial genomes. Using available FASTQ files from the European Nucleotide Archive (ENA) and Sequence Read Archive (SRA), the genomes were assembled using [Shovill] and made publicly available from the Iqbal Lab.
To make it easy to utilize Bactopia Tools with
assemblies from AllTheBacteria, bactopia-atb-formatter
was created. This tool will create a
directory structure that resembles output from an actual Bactopia run.
Usage: bactopia-atb-formatter [OPTIONS]
Restructure All-the-Bacteria assemblies to allow usage with Bactopia Tools
โญโ Required Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --path -p TEXT Directory where FASTQ files are stored [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Bactopia Directory Structure Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --bactopia-dir -b TEXT The path you would like to place bactopia โ
โ structure โ
โ [default: bactopia] โ
โ --publish-mode -m [symlink|copy] Designates plascement of assemblies will be โ
โ handled โ
โ [default: symlink] โ
โ --recursive -r Traverse recursively through provided path โ
โ --extension -e TEXT The extension of the assemblies e.g .fa,.fa.gz โ
โ [default: .fa] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Additional Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --verbose Increase the verbosity of output โ
โ --silent Only critical errors will be printed โ
โ --version -V Show the version and exit. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
To demonstrate the usage of bactopia-atb-formatter
, we will use assemblies for
Legionella pneumophila. The following steps will download the assemblies, build the
Bactopia directory structure, and then run legsta
via the Bactopia Tool.
First will download the Legionella pneumophila assemblies from AllTheBacteria. After downloading
we will extract them into a folder called legionella-assemblies
. Within this folder, there will be
subdirectories for each tarball that was downloaded.
mkdir atb-legionella
cd atb-legionella
# Download the assemblies
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__01.asm.tar.xz
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__02.asm.tar.xz
# Extract the assemblies
mkdir legionella-assemblies
tar -C legionella-assemblies -xJf legionella_pneumophila__01.asm.tar.xz
tar -C legionella-assemblies -xJf legionella_pneumophila__02.asm.tar.xz
With the assemblies extracted, we can now create the Bactopia directory structure using
bactopia-atb-formatter
. Once complete, each assembly will have its own folder created
which matches the BioSample accession of the assembly.
# Create the Bactopia directory structure
bactopia atb-formatter --path legionella-assemblies --recursive
2024-03-22 14:30:07 INFO 2024-03-22 14:30:07:root:INFO - Setting up Bactopia directory structure (use --verbose to see more details) atb_formatter.py:129
2024-03-22 14:30:08 INFO 2024-03-22 14:30:08:root:INFO - Bactopia directory structure created at bactopia atb_formatter.py:134
INFO 2024-03-22 14:30:08:root:INFO - Total assemblies processed: 5393
Please note the usage of --recursive
which will traverse the legionella-assemblies
directory
to find all assemblies contained. At this point, the bactopia
directory structure has been
created for 5,393 assemblies and is ready for use with Bactopia Tools.
As mentioned above, we will use legsta to analyze each of the Legionella pneumophila assemblies. To do this, we will use the legsta Bactopia Tool.
# Run legsta (please utilize Docker or Singularity only for reproducibility)
bactopia --wf legsta -profile singularity
Please note, for reproducibility, it is recommended to use Docker or Singularity with Bactopia Tools.
Upon completion, you should be met with something like the following:
[5d/d04297] process > BACTOPIATOOLS:LEGSTA:LEGSTA_MODULE (SAMN29911258) [100%] 5393 of 5393 โ
[71/c63bf7] process > BACTOPIATOOLS:LEGSTA:CSVTK_CONCAT (legsta) [100%] 1 of 1 โ
[16/833262] process > BACTOPIATOOLS:CUSTOM_DUMPSOFTWAREVERSIONS (1) [100%] 1 of 1 โ
Bactopia Tools: `legsta Execution Summary
---------------------------
Bactopia Version : 3.0.1
Nextflow Version : 23.10.1
Command Line : nextflow run /home/rpetit3/bactopia/main.nf --wf legsta --bactopia bactopia/ -profile singularity
Resumed : false
Completed At : 2024-03-22T15:09:54.959834620-06:00
Duration : 32m 51s
Success : true
Exit Code : 0
Error Report : -
Launch Dir : /home/rpetit3/test-legsta
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Completed at: 22-Mar-2024 15:09:55
Duration : 32m 52s
CPU hours : 5.2
Succeeded : 5'395
That's it! Now you can take advantage of any of the Bactopia Tools that utilize assemblies as inputs.
Your feedback is very valuable! If you run into any issues using Bactopia, have questions, or have some ideas to improve Bactopia, I highly encourage you to submit it to the Issue Tracker.
Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.
- Robert A. Petit III
- Twitter: @rpetit3