ukb_wgs_mapping_500k

This repo contains a map of the genomic coordinates for each WGS VCF block of the 500k WGS UK Biobank release, and the code used to create it.

The R script file "create_wgs_block_map.R" contains a single function which takes a table as input ("n_chunks_chr_size.csv"). The input table is structured in a 1-row-per-chromosome basis, with the content of each column described in the R script (see commented lines at the beginning of the script).

Each block contains 20kb of sequence. The R script creates a table ("WGS_500k_block_map.tsv") which is a map of exact coordinates contained by each WGS block.

WARNING: this is a quick and efficient way to derive the WGS block map without actually having to access each chunk file, but rather inferring their size based on the number of chunks on DNAnexus and the chromosome size. The various QC steps performed (e.g. comparing the number of expected chunks with the number of actual chunks) did not highlight any potential exception, as for example in the case of the 200k WGS release. However, this would not be robust to situations in which exceptions even out each other to the average of 20kb (e.g. if a given chunk is 25kb but it's "balanced" by another chunk 15kb in size). For what it's worth - I myself have used this map to retrieve >10k variants and all were in the chunk specified in the map, so I am fairly confident about its reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
WGS_500k_block_map.tsv		WGS_500k_block_map.tsv
create_wgs_block_map.R		create_wgs_block_map.R
n_chunks_chr_size.csv		n_chunks_chr_size.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ukb_wgs_mapping_500k

About

Releases

Packages

Languages

License

fmazzarotto/ukb_wgs_mapping_500k

Folders and files

Latest commit

History

Repository files navigation

ukb_wgs_mapping_500k

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages