Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.
Mugio is intended to aid in the identification of potential inverted repeats in seqeuncing data from Oxford-Nanopore Technologies Nanopore seqeuncing platform (MinION, PromethION, GridION, etc.).
Preprint:
Nanopore sequencing undergoes catastrophic sequence failure at inverted duplicated DNA sequences Pieter Spealman, Jaden Burrell, David Gresham doi: https://doi.org/10.1101/852665
Mugio reguires the user supplied data:
- Fastq generated by Albacore or Guppy
- Aligned bam file generated by Minimap2
Mugio reguires the following programs:
- gzip 1.5+
- python 2.7+ or 3.6+
- samtools 1.6+
- bedtools 2.26.0+
- (optional) The --blast command requires blast+ 2.9.0+
Mugio reguires the following python packages:
- os
- numpy
- argparse
- scipy.stats
- random
- pickle
- math
- re
- subprocess
- pandas
- json
Mugio is a stand alone python script as such it can be run locally by merely downloading the script. Installation through git clone is the preferred method. To download
git clone https://github.com/pspealman/mugio.git
To test installation
cd mugio
python mugio.py -demo
- Command --bprd (breakpoint retrieval and definition)
- Purpose: Identifies loci likely to be inverted repeat junctions associated with inverted duplications.
- Format:
python mugio.py -bprd -f <fastq_file> -s <sam_file> -bam <bam_file> -o <output_path_and/or_file_prefix>
- Demo:
python mugio.py -bprd -f demo/demo.fastq -s demo/demo.sam -bam demo/demo.bam -o demo_output/demo_bprd
- Results: Identified likely candidates are recorded in the out_put path as a bed file with the suffix '_bprd'. Therefore if '-o demo_output/demo_bprd' the results will be stored in 'demo_output/demo_bprd_bprd.bed'
- Command --evaluate
- Purpose: Calculates the correlation (Spearman's rho) between pre-breakpoint seqeunce length and post-breakpoint low scoring region length.
- Format:
python mugio.py --evaluate [-bpf bprd.bed | -snf sniffles.vcf] -s <sam_file> -o <output_path_and/or_file_prefix>
- Demo:
python mugio.py --evaluate -bpf demo_output/demo_bprd_bprd.bed -f demo/demo.fastq -s demo/demo.sam -o demo/demo_bprd_lengths
- Results: Identified candidates that have closed low-phred regions with have trace figures generated with low scoring regions identified. These will be saved in the out_path path as sub folders named after the inverted repeat junctions coordinates.