HapTree-X is a computational tool that phases various kinds of next-generation sequencing data. Currently, it supports whole-genome, whole-exome, 10X Genomics and RNA-seq data. It is especially powerful on RNA-seq data as it can utilize allelic imbalance to better phase genic regions.
HapTree-X binaries are available for Linux and macOS under releases.
To build HapTree-X from scratch, you will need the latest version of Seq compiler and llc (typically shipped with LLVM).
Then, issue make
to build HapTree-X. The resulting binary will be located in build/haptreex
.
Basic usage is:
haptreex -v [VCF file with variants to be phased]
-d [indexed SAM, BAM or CRAM file with the aligned reads]
-o [output file]
For RNA-seq data, use:
haptreex -v [VCF file with variants to be phased]
-r [indexed SAM, BAM or CRAM file with the aligned reads]
-g [GTF file compatible with the provided SAM/BAM/CRAM]
-o [output file]
-g
parameter is optional. However, its inclusion will result in better phases.
HapTree-X can also phase both DNA and RNA-seq samples at the same time. An example would be:
haptreex -v [VCF file with variants to be phased]
-r [indexed RNA-seq SAM, BAM or CRAM file with the aligned reads]
-d [indexed WGS/WXS SAM, BAM or CRAM file with the aligned reads]
-g [GTF file compatible with the provided SAM/BAM/CRAM]
-o [output file]
If you want to phase 10X genomics samples, pass --10x
flag to HapTree-X.
Finally, HapTree-X can be run in multi-threaded mode. To enable it, set the OMP_NUM_THREADS
variable to the desired number of threads.
An example would be:
OMP_NUM_THREADS=4 haptreex -v ...
HapTree-X's output follows the HapCUT output format convention. The output file will contain the set of phased haplotype blocks in a list format where the beginning of each block starts with BLOCK
and the end of each block is indicated by *****
.
Each line in between contains 5 tab-delimited fields, which are in order:
- Line number in the VCF file (ignoring header lines) that contain the het-SNP
- Phase of the het-SNP corresponding to the first digit in 0|1 or 1|0
- Phase of the het-SNP corresponding to the second digit in 0|1 or 1|0
- Chromosome name
- Chromosome position
Experimental notebook and the scripts used to generate the relevant paper data are located in paper/ directory.
For questions or issues, either open GitHub issue or contact us at:
- Ibrahim Numanagić (inumanag at uvic dot canada)
- Lillian Zhang (lillianz at mit dot education)