Skip to content

Commit

Permalink
reintroducing inputting haplotypes to kalis vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanchrist committed Sep 23, 2024
1 parent 7f9252b commit 9749c4b
Showing 1 changed file with 52 additions and 0 deletions.
52 changes: 52 additions & 0 deletions vignettes/Reading_Haplotype_Data.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: "Reading Haplotype Data"
author: "Louis Aslett & Ryan Christ"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Reading Haplotype Data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
`kalis` calculates pairwise genetic distances at loci of interest for a set of phased haplotypes stored in a `L x N` matrix `H` where `H[l,i] = 1` if haplotype `i` carries the alternative allele at locus `l` and `H[l,i] = 0` if it carries the reference allele. For efficiency `kalis` internally must load and store `H` as a matrix of bits.

*For all of the data input types below, it assumed that the haplotypes have already been phased, as required to run `kalis`.*

# Reading from BCF/VCF

To read phased haplotypes stored in a compressed or uncompressed BCF or VCF, the file must first be converted to HAP/SAMPLE/LEGEND format. For example, for a given compressed VCF, we simply call `bcftools` as follows.

```{r, engine = 'bash', eval = FALSE}
bcftools convert -h my.vcf.gz
```

Then, from `R`, we read in the haplotypes by calling

```{r load.data, results = "hide", message=FALSE, eval=FALSE}
require(kalis)
CacheHaplotypes("my.hap.gz")
```

See <http://samtools.github.io/bcftools/bcftools.html#convert> for more details.

For increased reading efficiency `CacheHaplotypes` look will look for the `my.legend` file that was produced by `bcftools` in the same directory as `my.hap.gz` so its worthwhile keeping the `.legend` files.

`bcftools` can read from many different formats into BCF/VCF, making it an easy tool for conversion into HAP/SAMPLE/LEGEND format.


`kalis` <http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/>

ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.hap.gz





0 comments on commit 9749c4b

Please sign in to comment.