Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.
Mugio is intended to aid in the identification of potential inverted repeats in seqeuncing data from Oxford-Nanopore Technologies Nanopore seqeuncing platform (MinION, PromethION, GridION, etc.).
Nanopore sequencing undergoes catastrophic sequence failure at inverted duplicated DNA sequences Pieter Spealman, Jaden Burrell, David Gresham doi:
Mugio reguires the user supplied data:
- Fastq generated by Albacore or Guppy
- Aligned bam file generated by Minimap2
Mugio reguires the following programs:
- gzip 1.5+
- python 2.7+ or 3.6+
- samtools 1.6+
- bedtools 2.26.0+
- (optional) The --blast command requires blast+ 2.9.0+
Mugio reguires the following python packages:
- os
- numpy
- argparse
- scipy.stats
- random
- pickle
- math
- re
- subprocess
- pandas
- json
Mugio is a stand alone python script as such it can be run locally by merely downloading the script. Installation through git clone is the preferred method. To download
git clone
To test installation
cd mugio
python -demo
- Command --bprd (breakpoint retrieval and definition)
- Purpose: Identifies loci likely to be inverted repeat junctions associated with inverted duplications.
- Format:
python -bprd -f <fastq_file> -s <sam_file> -bam <bam_file> -o <output_path_and/or_file_prefix>
- Demo:
python -bprd -f demo/demo.fastq -s demo/demo.sam -bam demo/demo.bam -o demo_output/demo_bprd
- Results: Identified likely candidates are recorded in the out_put path as a bed file with the suffix '_bprd'. Therefore if '-o demo_output/demo_bprd' the results will be stored in 'demo_output/demo_bprd_bprd.bed'
- Command --evaluate
- Purpose: Calculates the correlation (Spearman's rho) between pre-breakpoint seqeunce length and post-breakpoint low scoring region length.
- Format:
python --evaluate [-bpf bprd.bed | -snf sniffles.vcf] -s <sam_file> -o <output_path_and/or_file_prefix>
- Demo:
python --evaluate -bpf demo_output/demo_bprd_bprd.bed -f demo/demo.fastq -s demo/demo.sam -o demo/demo_bprd_lengths
- Results: Identified candidates that have closed low-phred regions with have trace figures generated with low scoring regions identified. These will be saved in the out_path path as sub folders named after the inverted repeat junctions coordinates.