Skip to content
forked from pspealman/mugio

Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.

License

Notifications You must be signed in to change notification settings

GreshamLab/mugio

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mugio

Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.

Mugio is intended to aid in the identification of potential inverted repeats in seqeuncing data from Oxford-Nanopore Technologies Nanopore seqeuncing platform (MinION, PromethION, GridION, etc.).

Citation

Preprint:

Nanopore sequencing undergoes catastrophic sequence failure at inverted duplicated DNA sequences Pieter Spealman, Jaden Burrell, David Gresham doi: https://doi.org/10.1101/852665

Requirements

Materials

Mugio reguires the user supplied data:

  • Fastq generated by Albacore or Guppy
  • Aligned bam file generated by Minimap2

Mugio reguires the following programs:

  • gzip 1.5+
  • python 2.7+ or 3.6+
  • samtools 1.6+
  • bedtools 2.26.0+
  • (optional) The --blast command requires blast+ 2.9.0+

Mugio reguires the following python packages:

  • os
  • numpy
  • argparse
  • scipy.stats
  • random
  • pickle
  • math
  • re
  • subprocess
  • pandas
  • json

Installation

Mugio is a stand alone python script as such it can be run locally by merely downloading the script. Installation through git clone is the preferred method. To download

git clone https://github.com/pspealman/mugio.git

To test installation

cd mugio
python mugio.py -demo

Quick Start pipeline for inverted repeat junction identification and evaluation

  1. Command --bprd (breakpoint retrieval and definition)
  • Purpose: Identifies loci likely to be inverted repeat junctions associated with inverted duplications.
  • Format:
python mugio.py -bprd -f <fastq_file> -s <sam_file> -bam <bam_file> -o <output_path_and/or_file_prefix>
  • Demo:
python mugio.py -bprd -f demo/demo.fastq -s demo/demo.sam -bam demo/demo.bam -o demo_output/demo_bprd
  • Results: Identified likely candidates are recorded in the out_put path as a bed file with the suffix '_bprd'. Therefore if '-o demo_output/demo_bprd' the results will be stored in 'demo_output/demo_bprd_bprd.bed'
  1. Command --evaluate
  • Purpose: Calculates the correlation (Spearman's rho) between pre-breakpoint seqeunce length and post-breakpoint low scoring region length.
  • Format:
python mugio.py --evaluate [-bpf bprd.bed | -snf sniffles.vcf] -s <sam_file> -o <output_path_and/or_file_prefix>
  • Demo:
python mugio.py --evaluate -bpf demo_output/demo_bprd_bprd.bed -f demo/demo.fastq -s demo/demo.sam -o demo/demo_bprd_lengths
  • Results: Identified candidates that have closed low-phred regions with have trace figures generated with low scoring regions identified. These will be saved in the out_path path as sub folders named after the inverted repeat junctions coordinates.

About

Inverted Repeat Junction identifier for use with ONT (MinION) sequencing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%