Analysis of Genetic Data 1: Inferring Population Structure

During this short workshop, we will apply simple numeric techniques such as principal components analysis (PCA) to investigate human genetic diversity and population structure from large-scale genetic data sets. We will investigate how large genetic data sets are commonly represented in computer files, and we will use popular command-line tools such as PLINK to prepare the "raw" genetic data for analysis. We will use R to compute principal components from the genetic data and visualize the results. This workshop is mainly intended to develop practical computing skills for researchers working with genetic data—concepts such as "genotype" and "allele frequency" will not be explained. This will be a hands-on workshop and we will do "live coding" throughout, so please bring your laptop!

Attendees will: (1) work through the steps of a basic population structure analysis in human genetics, starting with the “raw” source data, and ending with a visualization of population structure estimated from the genetic data; (2) understand how large genetic data sets are commonly represented in computer files; (3) use command-line tools (e.g., PLINK) to manipulate the genetic data; (4) use R to compute principal components, and visualize the results of PCA; (5) learn through "live coding."

Prerequisites

This hands-on workshop assumes participants are already familiar with R and a UNIX-like shell environment. An RCC user account is recommended, but not required. Guest access to the RCC cluster will be available in class to those with no RCC account. All participants must bring a laptop with a Mac, Linux, or Windows operating system that they have administrative privileges on.

Notes on data files

1kg.pop describes the population labels used in the 1000 Genomes data. This information comes from Supplementary Table 1 of the most recent 1000 Genomes paper (Nature, 2015, doi:10.1038/nature15393).
omni_samples.20141118.panel was downloaded from this FTP location: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip
20140625_related_individuals.txt was downloaded from this FTP location: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. This file gives information about the 31 genotype samples that were found to be closely related. The columns in the file from left to right are: (1) sample; (2) population; (3) gender; and (4) reason for exclusion.

Other information

This workshop attempts to apply elements of the Software Carpentry approach. See also this article. Please also take a look at the Code of Conduct, and the license information.
To generate PDFs of the slides from the R Markdown source, run make slides.pdf in the docs directory. For this to work, you will need to to install the rmarkdown package in R, as well as the packages used in slides.Rmd. For more details, see the Makefile.

Credits

These materials were developed by Peter Carbonetto at the University of Chicago. Thank you to Matthew Stephens for his support and guidance.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
docs		docs
.gitignore		.gitignore
1kg.pop		1kg.pop
20140625_related_individuals.txt		20140625_related_individuals.txt
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
conduct.md		conduct.md
functions.R		functions.R
omni_samples.20141118.panel		omni_samples.20141118.panel
pca.R		pca.R
pca.sbatch		pca.sbatch
slides.Rmd		slides.Rmd
slides.pdf		slides.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Genetic Data 1: Inferring Population Structure

Prerequisites

Notes on data files

Other information

Credits

About

Releases

Packages

Languages

License

rcc-uchicago/genetic-data-analysis-1

Folders and files

Latest commit

History

Repository files navigation

Analysis of Genetic Data 1: Inferring Population Structure

Prerequisites

Notes on data files

Other information

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages