Poliovirus Investigation Resource Automating Nanopore Haplotype Analysis
Piranha is a tool developed to help standardise and streamline sequencing of poliovirus. It's been developed by members of the Rambaut group at the University of Edinburgh as part of the Poliovirus Sequencing Consortium. Piranha runs an end-to-end read-to-report analysis that produces distributable, interactive reports alongside analysed consensus data. By default, piranha will attempt to generate consensus genomes for populations of the sam It produces an overall report, summarising the entire sequencing run, as well as a sample-specific report. Samples with virus of interest, such as VDPV are highlighted and certain quality-control flags can alert the user if there are issues with the run (such as a failed negative or positive control or identical sequences between samples that may be the result of contamination).
Any issues or feedback about the analysis or report please flag to this repository.
Note: piranha has been tested primarily on poliovirus VP1 sequencing data. There are alternative analysis modes in development (e.g. whole genome, panEV), but the authors recommend additional QC checks if using piranha beyond its established poliovirus VP1 pipeline.
See example report here
See example data here
- Download the release package for your machine from the ARTIFICE respository
You need to have Git, a version of conda (link to Miniconda here) and mamba installed to run the following commands.
Detailed installation instructions given below, but - in brief - to install with mamba run the following in a terminal:
git clone https://github.com/polio-nanopore/piranha.git
cd piranha
mamba env create -f environment.yml
conda activate piranha
pip install .
If this is your first time running piranha, you’ll need to clone the GitHub repository and install it. If you're running on a Mac machine (OS X) and have never used command line tools such as Git before, you may need to install them. They can be installed easily with the installer. Full instructions can be found here. Linux systems will have this installed already.
Double check you have git installed by typing the following into a terminal window:
git --version
You should see something like the following printed below:
git version 2.21.1 (Apple Git-122.3)
If you don't see this, follow the link to the install instructions above to install git. If you're using a Windows machine, it's possible to use Windows Subsystem for Linux, which should have git pre-installed.
You will also need to have a version of conda installed. We recommend the latest version of Miniconda, which can be accessed for download and install here.
I like to keep my repositories in the same place so they’re easy to find, so below I’m making a directory in my home directory (~
) called repositories and moving into it with cd
, which is short for "change directory".
(base) aine$ cd ~
(base) aine$ mkdir repositories
(base) aine$ cd repositories
Now we’re ready to clone the piranha GitHub repository. This creates a local copy of the piranha repository on your computer. It also retains a link back to the original repository, so if any updates are made (such as addition of new features or bug fixes), it's very easy to pull those changes down to your local machine and update your version of piranha. More information on git cloning can be found here.
Clone the piranha repository with:
(base) aine$ git clone https://github.com/polio-nanopore/piranha.git
(base) aine$ cd piranha
This directory remains linked to the original GitHub copy, so if you need to update it or get any changes you can do that from this location with the following command:
(base) aine$ git pull
We now need to create the piranha environment. Hopefully you have mamba installed, check if you do with the following command:
(base) aine$ mamba –version
If you see the message mamba not found
then you should install mamba with:
(base) aine$ conda install mamba -c conda-forge
Activate the piranha environment:
(base) aine$ conda activate piranha
(piranha) aine$
The (piranha)
in the prompt tells you that the piranha environment is activated.
Now you’ll need run install the piranha python package while you’re in the environment:
(piranha) aine$ pip install .
The .
refers to the current working directory (cwd), which should be the piranha repository. To double check you're in the correct directory, you can type pwd
(print working directory).
(piranha) aine$ pwd
/localdisk/home/repositories/piranha
If you see a path printed like the one above, ending with piranha, you know you're in the correct directory.
Congratulations! You should now have piranha installed.
(piranha) aine$ piranha --version
Should return piranha and the version number installed:
piranha v1.0
If no errors have come up (such as messages saying "command not found"), you should now be ready to run piranha!
Sometimes there can be issues unrelated to the commands you've run.
- Firstly, check you're in the piranha environment. Your prompt should start with
(piranha)
. If not, activate the piranha environment and check your install again.
Example:
(piranha) aine$
<- correct
(base) aine$
<- incorrect
- A common issue can be related to internet connectivity. If a download has failed because of a break in internet, I suggest running through the commands again and just try again.
mamba
andconda
can cache the files already downloaded, so often you can make progress even if internet connectivity is an issue. - Similarly, if you have a laptop with not enough storage space, installation and download can fail. To solve this, try clear some space on your machine and try again.
- If you're still having trouble installing via the command line, piranha can be installed using the ARTIFICE GUI (linked above).
- Please post an issue to the GitHub if there are further unresolved issues.
- Change directory to the piranha repository:
cd piranha
- Pull the latest changes from GitHub:
git pull
- Ensure you're in the piranha environment:
conda activate piranha
- Install the changes:
pip install .
If there has been a major change to piranha, it's possible the environment will need to be updated as follows:
- Change directory to the piranha repository:
cd piranha
- Pull the latest changes from GitHub:
git pull
- Ensure you're in the piranha environment:
conda activate piranha
- Update the environment:
mamba env update -f environment.yml
Piranha is also available now on bioconda and can be installed with
mamba install -c bioconda piranha-polio
Note: we recommend using piranha in the conda environment specified in the environment.yml file as per the instructions above. If you can't use conda for some reason, dependency details can be found in the environment.yml file.
piranha -i <demultiplexed read directory> -b <path/to/barcodes.csv>
At a minimum, two columns- one for barcode information and one with the name you'd like your sample to be called. This is also where you can flag which samples are negative or positive controls. Barcode and sample names should be unique and sample names shouldn't contain spaces or special characters. This is because the sample name will get incorporated into the output consensus fasta header and spaces in a fasta header disrupt the sequence ID.
barcode,sample
barcode01,EDI001
barcode02,EDI002
barcode03,EDI003
barcode04,negative
barcode05,positive
You can also include additional information in your barcode.csv. If you include a date
column or an EPID
column they'll automatically be included in the fasta header output too. Dates should always be in ISO format (YYYY-MM-DD) for metadata best practice. Piranha has a flag --all-metadata-to-header
that will take any metadata fields in your barcodes.csv and append them to the final output file (separated by a |
pipe symbol). Be aware any odd characters or spaces in these fields will also get added to the fasta header and can interfere with downstream phylogenetics you might want to run (e.g. :
,;
) can cause issues with some tree-building or reading software).
barcode,sample,EPID,date
barcode01,EDI001,EPI111,2022-10-10
barcode02,EDI002,EPI112,2022-09-20
barcode03,EDI003,EPI113,2022-09,21
barcode04,negative,,
barcode05,positive,,
Piranha is configured to make analysis as straightforward as possible for users running MinION sequencing. Piranha takes the output of guppy directly and looks for the demultiplexed read directory containing the barcodes you've specified in your barcodes.csv. Guppy outputs directories in the structure:
fastq_pass/
├── barcode01/
│ ├── FAT75518_pass_barcode01_6d172881_0.fastq.gz
│ ├── FAT75518_pass_barcode01_6d172881_1.fastq.gz
│ └── FAT75518_pass_barcode01_6d172881_2.fastq.gz
├── barcode02/
│ ├── FAT75518_pass_barcode02_6d172881_0.fastq.gz
│ ├── FAT75518_pass_barcode02_6d172881_1.fastq.gz
│ ├── FAT75518_pass_barcode02_6d172881_2.fastq.gz
│ ├── FAT75518_pass_barcode02_6d172881_3.fastq.gz
│ ├── FAT75518_pass_barcode02_6d172881_4.fastq.gz
│ └── FAT75518_pass_barcode02_6d172881_5.fastq.gz
├── barcode03/
│ ├── FAT75518_pass_barcode03_6d172881_0.fastq.gz
│ ├── FAT75518_pass_barcode03_6d172881_1.fastq.gz
│ └── FAT75518_pass_barcode03_6d172881_2.fastq.gz
└── barcode05/
├── FAT75518_pass_barcode05_6d172881_0.fastq.gz
├── FAT75518_pass_barcode05_6d172881_1.fastq.gz
└── FAT75518_pass_barcode05_6d172881_2.fastq.gz
This is what piranha will look for. Point the software to the directory containing the different barcodeXX sub-directories and it will iterate within these to find the files. Piranha can accept fastq (fq) or fastq.gz files. It will only attempt to analyse the barcodes present in the input csv file.
Piranha has been preconfigured with defaults specific to the VP1 protocol developed by the Polio Sequencing Consortium. All command line arguments (full list below) can be configured either as command line flags when running piranha, or as snake case arguments in a yaml config file (which can then be supplied with the -c
flag).
For example, you can supply a custom references file (piranha has a default one supplied, which you can access here) using the -r
flag, or by pointing to it within the config file.
Example Case 1: command line argument
(piranha) aine$ piranha -i /localdisk/home/data/minion_run_1/fastq_pass -b /localdisk/home/data/minion_run_1/barcodes.csv -r /localdisk/home/custom_reference_file.fasta
This command shows an example piranha run, where everything is in the default settings except the reference file. In this case you point piranha to the read directory, the barcodes.csv file and your custom reference file.
Example Case 2: config file
Example config file (called config.yaml in current working directory). Actual example file can be found here.
readdir: /localdisk/home/data/minion_run_1/fastq_pass
barcodes_csv: /localdisk/home/data/minion_run_1/barcodes.csv
reference_sequences: /localdisk/home/custom_reference_file.fasta
Then to run piranha you can simply run the command below, and all the information in the file will be included in your run.
(piranha) aine$ piranha -c config.yaml
Piranha allows you to specify which samples are controls (positive or negative). If the sample name is negative
or positive
within the barcode csv file, piranha will automatically detect that these are your controls. See minimal example above for format.
You can overwrite this if you would rather call your controls something else (like nc
, my_fave_control
etc) with the flags -pc,--positive-control
or -nc,--negative-control
.
Example:
piranha -i path/to/fastq_pass -b barcodes.csv -pc Positive1 -nc "my negative control"
Alternatively you can supply this in a config file with the fields:
positive_control: Positive1
negative_control: Negative1
But you need to make sure that the fields match within barcodes.csv. Also note that above because I've put spaces in sample names for my command line example negative control, this command will need quotes around the full name or else the terminal won't interpret it as a single field. ALSO, it's in general better to avoid having spaces in sample names because if you get a consensus sequence out of piranha as a fasta file, record ids are defined as the field up to the first space, so you can lose information in downstream analysis software if you're not careful. Best to just avoid spaces (and also special characters like :
, ;
and |
) in general when dealing with this kind of data that might have phylogenetics run on it.
Samples flagged as controls will appear in the report at the end in a separate table as well and will be flagged as either passing (row in table coloured green and a tick appears) or not passing (row in rable coloured red and no tick appears). Piranha's behaviour treats negative controls as passing if there are fewer than the configured minimum number of reads in the sample (Default: 50 reads) and positive controls as passing if there is more than the minimum number of reads in the sample for non-polio enterovirus (Default: 50 reads).
Piranha takes nanopore sequencing reads (fastq
) and matches them against a reference set of enterovirus VP1 sequences. This matching is currently performed with minimap2 (Li 2018), run in the mode configured for noisynanopore data (-x map-ont
). For more information on the details of what this pre-configured setting is, visit this link. In addition to these pre-configured settings, within piranha we currently only consider primary alignments (i.e. the best hit) for each read against the reference file, rather than all potential matches.
This reference set includes representatives of wild-type poliovirus 1, 2 and 3, as well as Sabin-1, 2 and 3 as well as a selection of other enterovirus VP1 sequences too.
IMPORTANT: Nanopore reads can have up to ~10% error rate and sporadic mapping to some references can occur at the read level, particularly when sequencing at great depth. When looking at the output read counts in the piranha report, this must be interpreted with caution and awareness of the underlying noise in the data. For example, low numbers of reads may map to WPV2 or WPV3 as they are present in the default database. This DOES NOT mean that WPV2 or WPV3 is present in your sample. Hits with greater than the minimun read count are highlighted in the report in the sample composition table (Table 2). If the read count passes the minimum thresholds (number of reads and percentage of sample) the reads will be taken forward for consensus building. It is only possible at the consensus level to get an accurate picture of that read population in your sample.
There are a number of default thresholds applied, which can be overwritten depending on your purposes.
-n,--min-read-length
-x, --max-read-length
The default read length range filters accepts reads between 1000 and 1300 nucleotide bases in length.
-d, --min-read-depth
-p, --min-read-pcent
These parameters set the minimum number of reads hitting a particular reference in the reference file (and the minimum percentage of reads within the sample) that are necessary to create a binned read group and attempt to make a consensus sequence for that particular sample. By default a minimum of 50 reads are necessary to build a consensus sequence and a minimum of 10% of the sample is required to be represented by that particular reference before it will attempt to create a consensus for this.
We have set the minimum read depth to be 50 reads in order to attempt to make a consensus. Within piranha, we run minimap2 to map reads against the background reference panel (in a similar manner to RAMPART). The top hit within the background reference panel is reported, by default showing the "display_name" field. The categories displayed are:
- Sabin1-Related
- Sabin2-Related
- Sabin3-Related
- WPV1
- WPV2
- WPV3
- NonPolioEV
- Unmapped
When sequencing samples at high depth, using mapping software on the raw nanopore reads (which are error prone) can lead to a certain level of noise. Hits above the minimum read depth threshold (Default >50) are highlighted in red in the final report. If the population of reads mapping to a particular reference successfully makes a consensus sequence at the end of the piranha pipeline, this is an indication of a genuine population of reads rather than noise.
Importantly, the reference that is hit within the background references file does not fully indicate what consensus sequence will be generated from the read population. Further phylogenetic analysis should be performed to confirm the identity of the sequence that you get.
Users can specify their own reference file or by default piranha will access the reference file packaged with the software.
The reference file must be in fasta format and, within that file, the first field must be the reference ID (without spaces). This reference file can include additional information in the following format:
>Poliovirus3-wt_JN812657 display_name=WPV3 species=Poliovirus3-wt cluster=Poliovirus3-wt
GGGGTGGACGATCTGATAACAGAA...
>Poliovirus3-Sabin_AY184221 display_name=Sabin3-related species=Sabin3-related cluster=Sabin3-related
GGTATTGAAGATTTGACTTCTGAA...
Notably, display_name
allows multiple references to be aggregated together/ anonymised within the final report. Current compatibility within piranha allows display_name
fields to include: WPV1
,WPV2
,WPV3
,Sabin1-related
,Sabin2-related
,Sabin3-related
and NonPolioEV
.
- Gathers all read files for a given barcode together
- Filters these reads by length (Default only reads between 1000 and 1300 bases are included for further analysis)
- The reads are mapped against a panel of references. By default there is a reference panel included as part of piranha. It includes VP1 sequences from various wild-type polio viruses, reference Sabin-1, Sabin-2 and Sabin-3 sequences for identification of Sabin-like or VDPV polioviruses, and also a number of non-polio enterovirus reference seqeuences. A custom VP1 fasta file can be supplied with the
-r
flag. - The read map files are parsed to assign each read to the closest virus sequence in the reference panel. This assigns each read a broad category of either Sabin-1 like, Sabin-2 like, Sabin-3 like, wild-type Poliovirus (1, 2, or 3), non-polio enterovirus, or unmapped.
- These broad category assignments are used to bin reads for further downstream analysis. Any bin with greater than the minimum read threshold (Default 50 reads, but can be customised) and minimum read percentage (default 10% of sample, but can be customised) is written out in a separate fastq file which will be used to generate the broad-category consensus sequence.
- For each bin, a consensus sequence is generated using medaka and variation information is calculated for each site in the alignment against the reference. This calculates the consensus variants within each sample.
- The variants that are flagged by medaka are assessed for read co-occurance to tease apart variant haplotypes within the sample.
- For the entire run, and for each individual barcode/ sample, an interactive html report is generated summarising the information.
Piranha is configured by default to work with the nested amplification VP1 protocol developed by the Polio Sequencing Consortium. If you're testing another protocol or want to modify the default analysis behaviour you can configure thresholds and models with various command line arguments (the snake case of the long-form arguments can be supplied in an input config file, see example here).
Piranha uses medaka to generate consensus sequences. Medaka is software developed by Oxford nanopore technologies that explicitly deals with the error profile of nanopore data. Unlike Illumina sequencing, the error profile in sequencing reads generated by nanopore technology is not randomly distributed. Areas of low complexity, repetitive regions and particularly homopolymeric runs are lower in accuracy. Software like medaka (instead of traditional variant calling and consensus generation software) uses machine learning methods to compensate for this non-random error profile and for a long time the leading variant calling software for nanopore sequencing has used machine learning methods (e.g. nanopolish and medaka). For detailed information about the internal workings of medaka, see this helpful tutorial.
By default the medaka model run is r941_min_high_g360
. You should ensure to run the appropriate model for your data. The format of medaka model names is of:
{pore}_{device}_{caller variant}_{caller version}
so the default model in use assumes you've run
- a R9.4.1 flow cell
- on a MinION (or also for GridION can leave it set to
min
- only need to changemin
toprom
if it was a PromethION run) - in high accuracy mode (if you've run in fast mode this should be changed)
- with Guppy version 3.6.0
To see the available list of medaka models installed you can use the piranha command
(piranha) aine$ piranha --medaka-list-models
and piranha will check which ones you have installed with your version of medaka, print them to screen and exit. Medaka may lag behind the latest models of guppy available, so use the closest model you can to what you're running but also be aware that the exact model for your version of Guppy may not yet exist if you're on the cutting edge of Guppy versions. Conversely, if you are running a very old version there may also not be an exact medaka model for your data. The developers of medaka suggest to run the correct model for best results. They also state:
Where a version of Guppy has been used without an exactly corresponding medaka model, the medaka model with the highest version equal to or less than the guppy version should be selected.
By default the output directory will be created in the current working directory and will be named analysis-YYYY-MM-DD
, where YYYY-MM-DD is today's date. This output can be configured in a number of ways. For example, the prefix analysis
can be overwritten by using the -pre/--output-prefix new_prefix
flag (or output_prefix: new_prefix
in a config file) and this will change the default behaviour to new_prefix_YYYY-MM-DD
. It's good practice not to include spaces or special characters in your directory names.
To completely overwrite the output directory name (rather than just the prefix), you can run the -o/--outdir
flag (or supply outdir
in the config file).
If you're running multiple analyses from the same directory and not supplying new directory names, piranha will append an incrementing number to the end of the directory name so that the contents within the previous run output don't cause conflicts within the new run. To change this default behaviour and overwrite the previous directory, use the --overwrite
flag. Note: this will wipe all the contents within the analysis-YYYY-MM-DD
directory and re-populate it with the output of the new piranha run.
By default, piranha removes the intermediate files it produces during the analysis run. If you want to access the intermediate files (for example, the bam files, the read bins etc.) run with --no-temp
and all intermediate files will be kept.
By default temporary files are stored in $TMPDIR
, which then gets wiped when the process completes. If you want to store temp files elsewhere this can be done with --tempdir
.
The output report can include some information about the sequencing run, the name of the individual running the report and the institute doing the sequencing. To give the report a specific run name (rather than the default title Nanopore sequencing report
) supply the new name with the command line flag --runname
or as runname:
in the config file. Similarly, to enable the report to display the name of the user and institute, provide --username
and --institute
(or within the config file).
The reports are available in English (default) and in French.
The header includes the title of the run (supply with --runname
) and the date of report generation.
Optionally the report can also display user and institute information beneath the header (--username
and --institute
flags).
This table gives summary information about which viruses were identified within each sample and, for Sabin-related polioviruses, the number of mutations away from Sabin. It also has a link that allows you to download a particular consensus sequence. The rows can be sorted by each column in either ascending or descending order by clicking on the column header. By default, the table is sorted by sample name. It's possible to search the table by typing in the text box on the right-hand side. Clicking on the sample name within the table will redirect the user to the individual sample reports. Be aware this link works in situ, when path to the sample report relative to the main output report has not changed, but if the report is moved or distributed without the barcode reports, this link will no longer function.
Under the table header, there's a dropdown menu that gives options to export the displayed table (either by copying to the clipboard, as a csv or directly printing it). By clicking on rows within the table, it's possible to select the subset of rows you're interested in exporting and this is what will be sent to the clipboard, file or printer.
To the right, there is a button to download the detailed table. This additional table has a compiled set of data from the sequencing run. By default the information provided in this this file are as follows.
First columns:
sample, barcode, EPID, institute,...
Then any additional fields supplied in the barcodes.csv for each sample (e.g. date of collection, date of sequencing)
Then information about each reference group in turn. For example, Sabin1-related:
Sabin1-related|closest_reference,Sabin1-related|num_reads,Sabin1-related|nt_diff_from_reference,Sabin1-related|pcent_match,Sabin1-related|classification,...
Finally a generic comments column that can be modified later (or prepopulated if provided in the barcodes.csv).
An example of this detailed file can be found here.
This table displays read counts for each sample that have mapped to each of the reference groups. Similar to Table 1, this table can be sorted, searched and exported.
This table will only appear if there are identical sequences in your sequencing run. It flags samples that contain consensus sequences that are completely identical. This may be a sign of contamination within the sequencing run, but doesn't necessarily mean this. This table serves as a prompt to investigate why these sequences may be identical.
If a negative and/or positive control are included in your sequencing run (piranha will automatically detect them if their sample name is negative
or positive
, or the user can specify the name of their controls with the command line flags or by providing them in a config file). If the control "passes" (i.e. has fewer than 50 reads for the negative control or has more than 100 reads in the NonPolioEV category for the positive control), the row will be coloured green and have a tick mark under the "Pass" column. If the controls fail, the row will be coloured red. If the controls fail, this may be an indication of a failed sequencing run.
This report gives additional information specific to the sequences generated for each sample.
This table summarises the reference groups found within the sample. Clicking on the reference group within the table will link the user down to the appropriate section of the report.
All consensus sequences generated for this sample are available within this box in fasta format. The header of the fasta file by default is of the format:
>sample|barcode|reference_group|reference|number_nt_diffs_from_reference|variants
AAGTCAGTCATCAGCTGACT...CAGTGATCGAGCTAGTAT
Additional information can be detected and appended to this fasta header. By default, if a date
column or EPID
column is provided in the barcodes.csv file, this will be appended to the header. Piranha has a flag --all-metadata-to-header
that will append all metadata fields provided in the input csv file to the header.
This fasta file can be highlighted and copied to the clipboard, or accessed as part of the "published_data" directory in the output directory of piranha.
This section of the report exists for each reference group identified within the sample.
Firstly, a table summarises the number of mutations and what mutations have been identified relative to the closest reference.
The variation figure shows the noise within the data in contrast to the variants that get called within the sample. Each point represents the percentage alternative allele present at that site in the VP1 sequence. The baseline reference for Sabin-related sequences is the relevant Sabin reference, whereas for the other reference groups (wild-type poliovirus and non-polio enterovirus) the baseline reference is the consensus sequence generated. This means that for Sabin-related sequences, the differences relative to Sabin are highlighted and for the other read populations within the sample, differences within sample relative to the consensus of that read population are highlighted. The SNPs called by Medaka relative to the chosen baseline reference (either Sabin or consensus sequence) are highlighted in dark green and insertion-deletion mutations (if any) are highlighted in yellow. Masked variants are highlighted in red. A variant will be masked if it will cause a frameshift within the VP1 region (like an indel whose length is not a multiple of 3) or if it is part of a string of variants in very close proximity (highlighting an issue around that area).
This snipit plot (https://github.com/aineniamh/snipit) highlights the consensus differences relative to the reference.
This plot calculates the percentage of reads that share the variants called against the reference, and can give a noisy estimate of haplotypes present within the sample. The calculation only takes bases of high-quality into account, so counts reflect the level of support rather than the level of coverage. For each variant, the percentage displayed is a function of the total high quality bases at that site, and for each variant the cooccurance is a percentage of high quality bases at each site in turn that share both variants. This figure can give an idea if mixed populations are present within the sample.
usage:
piranha -c <config.yaml> [options]
piranha -i input.csv [options]
Input options:
-c CONFIG, --config CONFIG
Input config file in yaml format, all command line arguments can be passed via the config file.
-i READDIR, --readdir READDIR
Path to the directory containing fastq read files
-b BARCODES_CSV, --barcodes-csv BARCODES_CSV
CSV file describing which barcodes were used on which sample
-r REFERENCE_SEQUENCES, --reference-sequences REFERENCE_SEQUENCES
Custom reference sequences file.
-pc POSITIVE_CONTROL, --positive-control POSITIVE_CONTROL
Sample name of positive control. Default: `positive`
-nc NEGATIVE_CONTROL, --negative-control NEGATIVE_CONTROL
Sample name of negative control. Default: `negative`
Analysis options:
-m ANALYSIS_MODE, --analysis-mode ANALYSIS_MODE
Specify analysis mode to run. Options: `vp1`. Default: `vp1`
--medaka-model MEDAKA_MODEL
Medaka model to run analysis using. Default: r941_min_high_g360
--medaka-list-models List available medaka models and exit.
-n MIN_READ_LENGTH, --min-read-length MIN_READ_LENGTH
Minimum read length. Default: 1000
-x MAX_READ_LENGTH, --max-read-length MAX_READ_LENGTH
Maximum read length. Default: 1300
-d MIN_READ_DEPTH, --min-read-depth MIN_READ_DEPTH
Minimum read depth required for consensus generation. Default: 50
-p MIN_READ_PCENT, --min-read-pcent MIN_READ_PCENT
Minimum percentage of sample required for consensus generation. Default: 10
--all-metadata-to-header
Parse all fields from input barcode.csv file and include in the output fasta headers. Be aware spaces in metadata will disrupt the
record id, so avoid these.
Output options:
-o OUTDIR, --outdir OUTDIR
Output directory. Default: `analysis-2022-XX-YY`
-pub PUBLISHDIR, --publishdir PUBLISHDIR
Output publish directory. Default: `analysis-2022-XX-YY`
-pre OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
Prefix of output directory & report name: Default: `analysis`
--datestamp DATESTAMP
Append datestamp to directory name when using <-o/--outdir>. Default: <-o/--outdir> without a datestamp
--overwrite Overwrite output directory. Default: append an incrementing number if <-o/--outdir> already exists
-temp TEMPDIR, --tempdir TEMPDIR
Specify where you want the temp stuff to go. Default: `$TMPDIR`
--no-temp Output all intermediate files. For development/ debugging purposes
Misc options:
--language LANGUAGE Report language. Options: English, French. Default: English
--runname RUNNAME Run name to appear in report. Default: Nanopore sequencing
--username USERNAME Username to appear in report. Default: no user name
--institute INSTITUTE
Institute name to appear in report. Default: no institute name
-t THREADS, --threads THREADS
Number of threads. Default: 1
--verbose Print lots of stuff to screen
-v, --version show program's version number and exit
-h, --help