This repo holds the analysis scripts/notes/summaries for the Genome in a Bottle Tandem Repeat Benchmark.
English, A.C., Dolzhenko, E., Ziaei Jam, H. et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-024-02225-z
The current release (v1.0) of the benchmark can be found here.
Included in the above tar-ball is a README.md
which can be seen
here.
This README has details on the files contained in the benchmark as well as instructions on how to compare your
caller using Truvari. Note that Truvari is currently under active development as a part of the GIAB TR project
so a manual installation of the latest development branch is recommended. The README explains this in detail.
The full adotto tandem-repeat catalog and pVCF of 86 haplotype-resolved assemblies created as part of this work are available via zenodo.org. Details of the files as well as download links are below.
Dataset | Zenodo | Current Version | Details |
---|---|---|---|
Catalog | v1.2.1 | Link | |
Variants | v0.1 | Link |
There are a few main sub-parts to this project. Each is contained in a sub-directory. Raw data that's too large to be kept on github will be made available and documented such that a user can find it and know where to place it within a clone of this repo in order to run sub-parts of the analysis.
- slides - GIAB team meeting slides
- manuscript - Summary and plotting workflows for the publication
- metadata - Data descriptor files (e.g. download paths of inputs used or sample ancestry information)
- regions - Identification of Tandem-Repeat regions of a reference
- variants - Calling variants from long-read haplotype resolved assemblies
- benchmark - Scripts for consolidating regions and variants to create the benchmark