Skip to content

Commit

Permalink
Merge branch 'main' into haplo_dev
Browse files Browse the repository at this point in the history
  • Loading branch information
aineniamh committed Nov 6, 2023
2 parents 7c91392 + bf41238 commit 167d534
Show file tree
Hide file tree
Showing 18 changed files with 438 additions and 281 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/piranha.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ jobs:
run: piranha --version
- name: Run piranha with test data
run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes01.csv -t 2 2>&1 | tee piranha.log
- name: Run piranha with all data
run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes.csv -t 2 2>&1 | tee piranha_all.log
- name: Run piranha in phylo mode
run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes.csv -t 2 -rp -ud -sd piranha/test/supp_data 2>&1 | tee piranha_phylo.log
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Any issues or feedback about the analysis or report please flag to this reposito
## Installing via ARTIFICE GUI
- Download the release package for your machine from the [PiranhaGUI respository](https://github.com/polio-nanopore/piranhaGUI)

## Installing within EPI2ME
- Piranha is also compatible with the [EPI2ME](https://labs.epi2me.io/downloads/) workflow desktop application. This mode of running does not provide the same level of input checking and configurability as either the ARTIFICE GUI or the command line, but is provided as an alternative interactive interface. Once EPI2ME and docker have been downloaded, follow the instructions to [download a custom workflow](https://labs.epi2me.io/quickstart/) using the github path `https://github.com/polio-nanopore/piranha`.

## Installation instructions (quick command line reference)

>You need to have Git, a version of conda (link to Miniconda [here](https://docs.conda.io/en/latest/miniconda.html)) and mamba installed to run the following commands.
Expand Down Expand Up @@ -364,17 +367,21 @@ and piranha will check which ones you have installed with your version of medaka

## Optional phylogenetics module **\*NEW FEATURE\***

Piranha allows the user to optionally run a phylogenetics module in addition to variant calling and consensus builing. There are 3 additional dependencies needed if you wish to run this module:
Piranha allows the user to optionally run a phylogenetics module in addition to variant calling and consensus builing. If you have previously installed piranha, are 3 additional dependencies needed if you wish to run this module:
- IQTREE2
- mafft
- jclusterfunk
The latest environment file contains this dependencies, so to install you can just update your environment (`conda env update -f environment.yml`) or run using the latest image for piranha GUI.

This module will cluster any consensus sequences generated during the run into `reference_group`, so either `Sabin1-related`, `Sabin2-related`, `Sabin3-related` or `WPV1` and will ultimately build one maximum-likelihood phylogeny for each reference group with consensus sequences in a given sequencing run. To annotate the phylogeny with certain metadata from the barcodes.csv file, specify columns to include with `-pcol/--phylo-metadata-columns`.

Piranha then extracts any relevant reference sequences from the installed reference file (identified by having `display_name=Sabin1-related` in their sequence header, or whichever reference group the relevant phylogeny will be for).

An optional file of local sequences can be supplied to supplement the phylogenetic analysis with `-ss/--supplementary-sequences`. This file should be in FASTA format, but does not need to be aligned. To allow piranha to assign the sequences to the relevant phylogeny, this file should have the reference group annotated in the header in the format `display_name=Sabin1-related`, for example.
An optional set of local sequences can be supplied to supplement the phylogenetic analysis. To supply them to piranha, point to the correct directory using `-sd,--supplementary-datadir`. The sequence files should be in FASTA format, but do not need to be aligned. To allow piranha to assign the sequences to the relevant phylogeny, the sequence files should have the reference group annotated in the header in the format `display_name=Sabin1-related`, for example.

This supplementary sequence file can be accompanied with a csv metadata file (one row per supplementary sequence) (`-sm/--supplementary-metadata`) and this metadata can be included in the final report and annotated onto the phylogenies (`-smcol/--supplementary-metadata-columns`). By default, the metadata is matched to the FASTA sequence name with a column titled `sequence_name` but this header name can be configured by specifying `-smid/--supplementary-metadata-id-column`
This supplementary sequence files can be accompanied with csv metadata files (one row per supplementary sequence) and this metadata can be included in the final report and annotated onto the phylogenies (`-smcol/--supplementary-metadata-columns`). By default, the metadata is matched to the FASTA sequence name with a column titled `sequence_name` but this header name can be configured by specifying `-smid/--supplementary-metadata-id-column`.

Piranha will iterate accross the directory supplied and amalgamate the FASTA files, retaining any sequences with `display_name=X` in the header description, where X can be one of `Sabin1-related`, `Sabin2-related`, `Sabin3-related` or `WPV1`. It then will read in every csv file it detects in this directory and attempts to match any metadata to the gathered fasta records. These will be added to the relevant phylogenies.

The phylogenetic pipeline is activated by running with the flag `-rp/--run-phylo`, which then triggers the following analysis steps:
- Amalgamate the newly generated consensus sequences for all barcodes into their respective reference groups.
Expand All @@ -388,6 +395,17 @@ The phylogenetic pipeline is activated by running with the flag `-rp/--run-phylo
- Annotate the tree newick files with the specified metadata (Default: just whether it's a new consensus sequence or not).
- Extract phylogenetic trees and embed in interactive report.

## Update local database **\*NEW FEATURE\***

If you supply a path to the `-sd,--supplementary-datadir` for the phylogenetics module, you have the option of updating this data directory with the new consesnsus sequences generated during the piranaha analysis. If you run with the `-ud,--update-local-database` flag, piranha will write out the new sequences and any accompanying metadata supplied into the directory provided.

The files written out will be in the format `runname.today.fasta` and `runname.today.csv`. For example, if your runname supplied is `MIN001` and today's date is `2023-11-05`, the files written will be:
- `MIN001.2023-11-05.fasta`
- `MIN001.2023-11-05.csv`
with the newly generated consensus sequences and accompanying metadata from that run.

> *Note:* if supplying the supplementary directory to piranha on a subsequent run, your updated local database will be included in the phylogenetics. However, piranha will ignore any files with identical `runname.today` patterns to the active run. So, if your current run would produce files called `MIN001.2023-11-05.fasta` and `MIN001.2023-11-05.csv`, if those files already exist in the supplementary data directory, they will be ignored. This is to avoid conflicts if piranha is run multiple times on the same data.
## Output options

By default the output directory will be created in the current working directory and will be named `analysis-YYYY-MM-DD`, where YYYY-MM-DD is today's date. This output can be configured in a number of ways. For example, the prefix `analysis` can be overwritten by using the `-pre/--output-prefix new_prefix` flag (or `output_prefix: new_prefix` in a config file) and this will change the default behaviour to `new_prefix_YYYY-MM-DD`. It's good practice not to include spaces or special characters in your directory names.
Expand Down
Binary file added demo.tar.gz
Binary file not shown.
35 changes: 26 additions & 9 deletions main.nf
Original file line number Diff line number Diff line change
@@ -1,24 +1,41 @@
process run_piranha {

publishDir "${params.out_dir}", mode: 'copy'
publishDir "${params.out_dir}", mode: 'copy', saveAs: { fn -> fn.replace("piranha_output/", "")}

container "${params.wf.container}"
container "${params.wf.container}:${workflow.manifest.version}"

input:
path barcodes_csv
path run_dir

output:
path "piranha_report.html"
path "barcode_reports"
path "detailed_run_report.csv"
path "published_data"
path "piranha_output/*"

script:
extra = ""
if ( params.config )
extra += " --config ${params.config}"
if ( params.output_intermediates )
extra += " --no-temp"
if ( params.min_map_quality )
extra += " --min-map-quality ${params.min_map_quality}"
if ( params.min_read_length )
extra += " --min-read-length ${params.min_read_length}"
if ( params.max_read_length )
extra += " --max-read-length ${params.max_read_length}"
if ( params.min_read_depth )
extra += " --min-read-depth ${params.min_read_depth}"
if ( params.min_read_pcent )
extra += " --min-read-pcent ${params.min_read_pcent}"
if ( params.primer_length )
extra += " --primer-length ${params.primer_length}"
if ( params.run_phylo )
extra += " --run-phylo"
if ( params.supplementary_datadir )
extra += " --supplementary-datadir ${params.supplementary_datadir}"
"""
piranha -b ${barcodes_csv} -i ${run_dir} -o piranha_output --tempdir piranha_tmp -t ${task.cpus}
mv piranha_output/* .
mv report.html piranha_report.html
piranha -b ${barcodes_csv} -i ${run_dir} -o piranha_output --tempdir piranha_tmp -t ${task.cpus} ${extra}
mv piranha_output/report.html piranha_output/piranha_report.html
"""

}
Expand Down
20 changes: 17 additions & 3 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,21 @@ params {
run_dir = null

out_dir = "output"
output_intermediates = false
config = null

// Analysis options
min_map_quality = null
min_read_length = null
max_read_length = null
min_read_depth = null
min_read_pcent = null
min_aln_block = null
primer_length = null

// Phylo options
run_phylo = false
supplementary_datadir = null

// Other options
disable_ping = false
Expand All @@ -37,7 +52,6 @@ params {
]
agent = null
container = "polionanopore/piranha"
container_sha = "sha256:f91fd4880a848ee287de9f6dc59566739267103f5f82de3f0f11873b35b5c78a"
}
}

Expand All @@ -49,7 +63,7 @@ manifest {
description = 'Polio investigation resource automating nanopore haplotype analysis.'
mainScript = 'main.nf'
nextflowVersion = '>=23.04.2'
version = '1.1'
version = '1.2'
}


Expand Down Expand Up @@ -114,4 +128,4 @@ trace {

env {
PYTHONNOUSERSITE = 1
}
}
68 changes: 61 additions & 7 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"title": "polio-nanopore/piranha",
"description": "Polio investigation resource automating nanopore haplotype analysis.",
"url": "https://github.com/polio-nanopore/piranha",
"demo_url": "https://raw.githubusercontent.com/polio-nanopore/piranha/main/piranha/test/pak_run",
"demo_url": "https://raw.githubusercontent.com/polio-nanopore/piranha/main/demo.tar.gz",
"type": "object",
"definitions": {
"input_options": {
Expand Down Expand Up @@ -42,24 +42,75 @@
"output_options": {
"title": "Output Options",
"type": "object",
"fa_icon": "fas fa-terminal",
"description": "Parameters for saving and naming workflow outputs.",
"default": "",
"properties": {
"out_dir": {
"type": "string",
"format": "directory-path",
"default": "output",
"title": "Output folder name",
"description": "Directory for output of all user-facing files."
},
"output_intermediates": {
"type": "boolean",
"title": "Output intermediate files"
}
}
},
"analysis_options": {
"title": "Analysis Options",
"type": "object",
"fa_icon": "fas fa-terminal",
"description": "Define the thresholds for filtering and assembly.",
"properties": {
"config": {
"type": "string",
"format": "path",
"title": "Config file",
"description": "A config file with parameters for piranha."
},
"min_map_quality": {
"type": "integer",
"title": "Minimum mapping quality"
},
"min_read_length": {
"type": "integer",
"title": "Minimum read length"
},
"max_read_length": {
"type": "integer",
"title": "Maximum read length"
},
"min_read_depth": {
"type": "integer",
"title": "Minimum read depth required for consensus generation"
},
"min_read_pcent": {
"type": "number",
"title": "Minimum percentage of sample required for consensus generation"
},
"primer_length": {
"type": "integer",
"title": "Length of primer sequences to trim off start and end of reads"
}
}
},
"advanced_options": {
"title": "Advanced Options",
"phylo_options": {
"title": "Phylogenetic Options",
"type": "object",
"description": "Advanced options for configuring processes inside the workflow.",
"default": "",
"description": "Options for running the phylogenetics module.",
"properties": {
"run_phylo": {
"type": "boolean",
"title": "Run phylogenetics pipeline"
},
"supplementary_datadir": {
"type": "string",
"format": "path",
"title": "Supplementary data",
"description": "Path to directory containing supplementary sequence FASTA file and optional metadata to be incorporated into phylogenetic analysis."
}
}
},
"miscellaneous_options": {
Expand Down Expand Up @@ -93,7 +144,10 @@
"$ref": "#/definitions/output_options"
},
{
"$ref": "#/definitions/advanced_options"
"$ref": "#/definitions/analysis_options"
},
{
"$ref": "#/definitions/phylo_options"
},
{
"$ref": "#/definitions/miscellaneous_options"
Expand Down
2 changes: 1 addition & 1 deletion piranha/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
_program = "piranha"
__version__ = "1.1.1"
__version__ = "1.2"

__all__ = [
"input_parsing",
Expand Down
46 changes: 0 additions & 46 deletions piranha/analysis/lengthFilter.py

This file was deleted.

Loading

0 comments on commit 167d534

Please sign in to comment.