For a given mutation, query its mutated reads from a BAM, merge the reads by positions and give the unique count.
- Python 3.4+
- Pysam (
$pip install pysam
)
$ python -m MrBam.main --help
usage: main.py [-h] [-c CFDNA] [-g GDNA] [-o OUTPUT] [-i INFO] [-q QUAL] [-s]
[-f] [-v] query
example:
$ MrBam sample.vcf --cfdna sample_cfdna.bam -o sample_MrBam.vcf --simple
positional arguments:
query vcf file contains mutations to query
optional arguments:
-h, --help show this help message and exit
-c, --cfdna CFDNA bam file contains cfdna reads info. There must be a
corresponding .bai file in the same directory
-g, --gdna GDNA bam file contains gdna reads info. There must be a
corresponding .bai file in the same directory
-o, --output OUTPUT output vcf file. Will be overwritten if already exists
--skip SKIP skip the first N lines
-q, --qual QUAL drop bases whose qulity is less than this (default:
25)
-s, --simple annotate less infomations into vcf output
-f, --fast do not infer origin read size by CIGAR, it can be
faster and consume less memory.
--drop-inconsist drop different reads stack at the same position. This
decreases sensitivity.
--dropXA drop reads that has XA tag (multiple alignment)
-m, --mismatch-limit MISMATCH_LIMIT
if set, drop reads that has more mismatches than the
limit. requires a 'MD' or a 'NM' tag to be present.
-v, --verbos output debug info
#sample option bam_size(mb) vcf_lines CPU_time(s) Memory(mb)
Sam3 (default) 194 14978 147 1116
Sam3 --fast 194 14978 129 27
Sam2 (default) 655 33702 500 3162
Sam2 --fast 655 33702 417 28
Sam1 (default) 1620 113066 5952 8377
Sam1 --fast 1620 113066 5785 34
Sam4 (default) 2338 648336 49067 9912
Sam4 --fast 2338 648336 60393 36
- CPU_time is user + sys
- Memory may vary accroding to system memory pressure
- Test on Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz