Merge branch 'main' into haplo_dev

polio-nanopore · Nov 6, 2023 · 167d534 · 167d534
2 parents 7c91392 + bf41238
commit 167d534
Show file tree

Hide file tree

Showing 18 changed files with 438 additions and 281 deletions.
diff --git a/.github/workflows/piranha.yml b/.github/workflows/piranha.yml
@@ -26,5 +26,5 @@ jobs:
         run: piranha --version
       - name: Run piranha with test data
         run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes01.csv -t 2 2>&1 | tee piranha.log
-      - name: Run piranha with all data
-        run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes.csv -t 2 2>&1 | tee piranha_all.log
+      - name: Run piranha in phylo mode
+        run: piranha -i piranha/test/pak_run/demultiplexed --verbose -b piranha/test/pak_run/barcodes.csv -t 2 -rp -ud -sd piranha/test/supp_data 2>&1 | tee piranha_phylo.log
diff --git a/README.md b/README.md
@@ -22,6 +22,9 @@ Any issues or feedback about the analysis or report please flag to this reposito
 ## Installing via ARTIFICE GUI
 - Download the release package for your machine from the [PiranhaGUI respository](https://github.com/polio-nanopore/piranhaGUI)
 
+## Installing within EPI2ME
+- Piranha is also compatible with the [EPI2ME](https://labs.epi2me.io/downloads/) workflow desktop application. This mode of running does not provide the same level of input checking and configurability as either the ARTIFICE GUI or the command line, but is provided as an alternative interactive interface. Once EPI2ME and docker have been downloaded, follow the instructions to [download a custom workflow](https://labs.epi2me.io/quickstart/) using the github path `https://github.com/polio-nanopore/piranha`.
+
 ## Installation instructions (quick command line reference)
 
 >You need to have Git, a version of conda (link to Miniconda [here](https://docs.conda.io/en/latest/miniconda.html)) and mamba installed to run the following commands. 
@@ -364,17 +367,21 @@ and piranha will check which ones you have installed with your version of medaka
 
 ## Optional phylogenetics module **\*NEW FEATURE\***
 
-Piranha allows the user to optionally run a phylogenetics module in addition to variant calling and consensus builing. There are 3 additional dependencies needed if you wish to run this module:
+Piranha allows the user to optionally run a phylogenetics module in addition to variant calling and consensus builing. If you have previously installed piranha, are 3 additional dependencies needed if you wish to run this module:
 - IQTREE2
 - mafft
 - jclusterfunk
+The latest environment file contains this dependencies, so to install you can just update your environment (`conda env update -f environment.yml`) or run using the latest image for piranha GUI.
+
 This module will cluster any consensus sequences generated during the run into `reference_group`, so either `Sabin1-related`, `Sabin2-related`, `Sabin3-related` or `WPV1` and will ultimately build one maximum-likelihood phylogeny for each reference group with consensus sequences in a given sequencing run. To annotate the phylogeny with certain metadata from the barcodes.csv file, specify columns to include with `-pcol/--phylo-metadata-columns`.
 
 Piranha then extracts any relevant reference sequences from the installed reference file (identified by having `display_name=Sabin1-related` in their sequence header, or whichever reference group the relevant phylogeny will be for). 
 
-An optional file of local sequences can be supplied to supplement the phylogenetic analysis with `-ss/--supplementary-sequences`. This file should be in FASTA format, but does not need to be aligned. To allow piranha to assign the sequences to the relevant phylogeny, this file should have the reference group annotated in the header in the format `display_name=Sabin1-related`, for example.
+An optional set of local sequences can be supplied to supplement the phylogenetic analysis. To supply them to piranha, point to the correct directory using `-sd,--supplementary-datadir`. The sequence files should be in FASTA format, but do not need to be aligned. To allow piranha to assign the sequences to the relevant phylogeny, the sequence files should have the reference group annotated in the header in the format `display_name=Sabin1-related`, for example.
 
-This supplementary sequence file can be accompanied with a csv metadata file (one row per supplementary sequence) (`-sm/--supplementary-metadata`) and this metadata can be included in the final report and annotated onto the phylogenies (`-smcol/--supplementary-metadata-columns`). By default, the metadata is matched to the FASTA sequence name with a column titled `sequence_name` but this header name can be configured by specifying `-smid/--supplementary-metadata-id-column`
+This supplementary sequence files can be accompanied with csv metadata files (one row per supplementary sequence) and this metadata can be included in the final report and annotated onto the phylogenies (`-smcol/--supplementary-metadata-columns`). By default, the metadata is matched to the FASTA sequence name with a column titled `sequence_name` but this header name can be configured by specifying `-smid/--supplementary-metadata-id-column`. 
+
+Piranha will iterate accross the directory supplied and amalgamate the FASTA files, retaining any sequences with `display_name=X` in the header description, where X can be one of `Sabin1-related`, `Sabin2-related`, `Sabin3-related` or `WPV1`. It then will read in every csv file it detects in this directory and attempts to match any metadata to the gathered fasta records. These will be added to the relevant phylogenies.
 
 The phylogenetic pipeline is activated by running with the flag `-rp/--run-phylo`, which then triggers the following analysis steps:
 - Amalgamate the newly generated consensus sequences for all barcodes into their respective reference groups.
@@ -388,6 +395,17 @@ The phylogenetic pipeline is activated by running with the flag `-rp/--run-phylo
 - Annotate the tree newick files with the specified metadata (Default: just whether it's a new consensus sequence or not). 
 - Extract phylogenetic trees and embed in interactive report.
 
+## Update local database  **\*NEW FEATURE\***
+
+If you supply a path to the `-sd,--supplementary-datadir` for the phylogenetics module, you have the option of updating this data directory with the new consesnsus sequences generated during the piranaha analysis. If you run with the `-ud,--update-local-database` flag, piranha will write out the new sequences and any accompanying metadata supplied into the directory provided.
+
+The files written out will be in the format `runname.today.fasta` and `runname.today.csv`. For example, if your runname supplied is `MIN001` and today's date is `2023-11-05`, the files written will be:
+- `MIN001.2023-11-05.fasta`
+- `MIN001.2023-11-05.csv`
+with the newly generated consensus sequences and accompanying metadata from that run.
+
+> *Note:* if supplying the supplementary directory to piranha on a subsequent run, your updated local database will be included in the phylogenetics. However, piranha will ignore any files with identical `runname.today` patterns to the active run. So, if your current run would produce files called `MIN001.2023-11-05.fasta` and `MIN001.2023-11-05.csv`, if those files already exist in the supplementary data directory, they will be ignored. This is to avoid conflicts if piranha is run multiple times on the same data. 
+
 ## Output options
 
 By default the output directory will be created in the current working directory and will be named `analysis-YYYY-MM-DD`, where YYYY-MM-DD is today's date. This output can be configured in a number of ways. For example, the prefix `analysis` can be overwritten by using the `-pre/--output-prefix new_prefix` flag (or `output_prefix: new_prefix` in a config file) and this will change the default behaviour to `new_prefix_YYYY-MM-DD`. It's good practice not to include spaces or special characters in your directory names. 

diff --git a/demo.tar.gz b/demo.tar.gz
diff --git a/main.nf b/main.nf
@@ -1,24 +1,41 @@
 process run_piranha {
 
-    publishDir "${params.out_dir}", mode: 'copy'
+    publishDir "${params.out_dir}", mode: 'copy', saveAs: {  fn -> fn.replace("piranha_output/", "")}
 
-    container "${params.wf.container}"
+    container "${params.wf.container}:${workflow.manifest.version}"
 
     input:
         path barcodes_csv
         path run_dir
 
     output:
-        path "piranha_report.html"
-        path "barcode_reports"
-        path "detailed_run_report.csv"
-        path "published_data"
+        path "piranha_output/*"
 
     script:
+    extra = ""
+    if ( params.config )
+        extra += " --config ${params.config}"
+    if ( params.output_intermediates )
+        extra += " --no-temp"
+    if ( params.min_map_quality )
+        extra += " --min-map-quality ${params.min_map_quality}"
+    if ( params.min_read_length )
+        extra += " --min-read-length ${params.min_read_length}"
+    if ( params.max_read_length )
+        extra += " --max-read-length ${params.max_read_length}"
+    if ( params.min_read_depth )
+        extra += " --min-read-depth ${params.min_read_depth}"
+    if ( params.min_read_pcent )
+        extra += " --min-read-pcent ${params.min_read_pcent}"
+    if ( params.primer_length )
+        extra += " --primer-length ${params.primer_length}"
+    if ( params.run_phylo )
+        extra += " --run-phylo"
+    if ( params.supplementary_datadir )
+        extra += " --supplementary-datadir ${params.supplementary_datadir}"
     """
-    piranha -b ${barcodes_csv} -i ${run_dir} -o piranha_output --tempdir piranha_tmp -t ${task.cpus}
-    mv piranha_output/* .
-    mv report.html piranha_report.html
+    piranha -b ${barcodes_csv} -i ${run_dir} -o piranha_output --tempdir piranha_tmp -t ${task.cpus} ${extra}
+    mv piranha_output/report.html piranha_output/piranha_report.html
     """
 
 }

diff --git a/nextflow.config b/nextflow.config
@@ -18,6 +18,21 @@ params {
     run_dir = null
 
     out_dir = "output"
+    output_intermediates = false
+    config = null
+
+    // Analysis options
+    min_map_quality = null
+    min_read_length = null
+    max_read_length = null
+    min_read_depth = null
+    min_read_pcent = null
+    min_aln_block = null
+    primer_length = null
+
+    // Phylo options
+    run_phylo = false
+    supplementary_datadir = null
 
     // Other options
     disable_ping = false
@@ -37,7 +52,6 @@ params {
         ]
         agent = null
         container = "polionanopore/piranha"
-        container_sha = "sha256:f91fd4880a848ee287de9f6dc59566739267103f5f82de3f0f11873b35b5c78a"
     }
 }
 
@@ -49,7 +63,7 @@ manifest {
     description     = 'Polio investigation resource automating nanopore haplotype analysis.'
     mainScript      = 'main.nf'
     nextflowVersion = '>=23.04.2'
-    version         = '1.1'
+    version         = '1.2'
 }
 
 
@@ -114,4 +128,4 @@ trace {
 
 env {
 	PYTHONNOUSERSITE = 1
-}
+}
diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -4,7 +4,7 @@
   "title": "polio-nanopore/piranha",
   "description": "Polio investigation resource automating nanopore haplotype analysis.",
   "url": "https://github.com/polio-nanopore/piranha",
-  "demo_url": "https://raw.githubusercontent.com/polio-nanopore/piranha/main/piranha/test/pak_run",
+  "demo_url": "https://raw.githubusercontent.com/polio-nanopore/piranha/main/demo.tar.gz",
   "type": "object",
   "definitions": {
     "input_options": {
@@ -42,24 +42,75 @@
     "output_options": {
       "title": "Output Options",
       "type": "object",
+      "fa_icon": "fas fa-terminal",
       "description": "Parameters for saving and naming workflow outputs.",
-      "default": "",
       "properties": {
         "out_dir": {
           "type": "string",
           "format": "directory-path",
           "default": "output",
           "title": "Output folder name",
           "description": "Directory for output of all user-facing files."
+        },
+        "output_intermediates": {
+          "type": "boolean",
+          "title": "Output intermediate files"
+        }
+      }
+    },
+    "analysis_options": {
+      "title": "Analysis Options",
+      "type": "object",
+      "fa_icon": "fas fa-terminal",
+      "description": "Define the thresholds for filtering and assembly.",
+      "properties": {
+        "config": {
+          "type": "string",
+          "format": "path",
+          "title": "Config file",
+          "description": "A config file with parameters for piranha."
+        },
+        "min_map_quality": {
+          "type": "integer",
+          "title": "Minimum mapping quality"
+        },
+        "min_read_length": {
+          "type": "integer",
+          "title": "Minimum read length"
+        },
+        "max_read_length": {
+          "type": "integer",
+          "title": "Maximum read length"
+        },
+        "min_read_depth": {
+          "type": "integer",
+          "title": "Minimum read depth required for consensus generation"
+        },
+        "min_read_pcent": {
+          "type": "number",
+          "title": "Minimum percentage of sample required for consensus generation"
+        },
+        "primer_length": {
+          "type": "integer",
+          "title": "Length of primer sequences to trim off start and end of reads"
         }
       }
     },
-    "advanced_options": {
-      "title": "Advanced Options",
+    "phylo_options": {
+      "title": "Phylogenetic Options",
       "type": "object",
-      "description": "Advanced options for configuring processes inside the workflow.",
-      "default": "",
+      "description": "Options for running the phylogenetics module.",
       "properties": {
+        "run_phylo": {
+          "type": "boolean",
+          "title": "Run phylogenetics pipeline"
+        },
+        "supplementary_datadir": {
+          "type": "string",
+          "format": "path",
+          "title": "Supplementary data",
+          "description": "Path to directory containing supplementary sequence FASTA file and optional metadata to be incorporated into phylogenetic analysis."
+        }
       }
     },
     "miscellaneous_options": {
@@ -93,7 +144,10 @@
       "$ref": "#/definitions/output_options"
     },
     {
-      "$ref": "#/definitions/advanced_options"
+      "$ref": "#/definitions/analysis_options"
+    },
+    {
+      "$ref": "#/definitions/phylo_options"
     },
     {
       "$ref": "#/definitions/miscellaneous_options"

diff --git a/piranha/__init__.py b/piranha/__init__.py
@@ -1,5 +1,5 @@
 _program = "piranha"
-__version__ = "1.1.1"
+__version__ = "1.2"
 
 __all__ = [
     "input_parsing",

diff --git a/piranha/analysis/lengthFilter.py b/piranha/analysis/lengthFilter.py