This site contains course materials for SISG Module 17: WGS Data Analysis, June 12-14, 2024.
- Instructors: Laura Raffield and Matthew Conomos
This module will provide an introduction to analyzing genotype data generated from whole genome sequencing (WGS). It will focus on extensions of standard GWAS analyses (e.g. rare-variant association tests) and “post-GWAS” follow-up analyses (e.g. conditional analysis, fine-mapping), and how WGS may improve results or be best utilized for these analyses; methods that incorporate variant annotation information will be highlighted.
Methods and examples will be informed by the instructors’ experience in large human genetics consortia (e.g. TOPMed), and, therefore, will focus on analyzing human data, but may be applicable/extendable to other organisms. A basic introduction to cloud computing will be provided, and students will perform hands-on exercises on a genomic analysis cloud platform.
After attending this module, participants will be able to:
- Understand how to perform association analyses for rare variants measured in WGS data using aggregate tests
- Access variant annotation resources and understand how to incorporate annotation information into analyses to improve power and inform results
- Understand the theory of, and how and when to perform, various “post-GWAS” follow-up analyses
- Leverage multi-ancestry WGS data
- Appreciate the utility of existing genomic analysis cloud platforms and get hands-on experience with cloud computing on one of these platforms
Course material will be presented through lectures. Slides for lectures are linked in the schedule below.
Many of the lectures will be followed with hands-on tutorials/exercises. Students are encouraged to work through the tutorials together. Afterwards, the instructors will walk through the tutorials and lead a discussion.
To run the tutorials, log into NHLBI BioData Catalyst powered by Seven Bridges with your username and password -- we will use this platform for live demonstrations during the course.
- You will retain access to the Seven Bridges platform, including your SISG Project with all of the course materials even after the course ends. The SISG24 Workshop billing group will remain available to you for a short period of time, after which you will need to set up another payment method to run analyses. You can request pilot cloud credits ($500 worth) from BioData Catalyst. Additionally, there is guidance available for writing BioData Catalyst cloud costs into your grant proposal budget.
All of the R code and data can also be downloaded from the github repository from which the site is built and run on your local machine. Download the complete workshop data and tutorials: https://github.com/UW-GAC/SISG_2024/archive/main.zip
NOTE: All times are Eastern Daylight Time (GMT-04:00)
Wednesday, June 12th
Time | Topic | Lecture | Tutorials/Exercises |
---|---|---|---|
1:30pm-1:35pm | Introduction | Slides | |
1:35pm-3:00pm | Intro to Cloud Computing for WGS Data Analysis Intro to GDS Tutorial |
Slides | .Rmd | .html |
3:00pm-3:30pm | Coffee Break | ||
3:30pm-5:00pm | GWAS | Slides | .Rmd | .html |
Extra | Population Structure and Relatedness Tutorial | .Rmd | .html |
Thursday, June 13th
Time | Topic | Lecture | Tutorials/Exercises |
---|---|---|---|
8:30am-10:00am | GWAS: Advanced Model Extenstions | Slides | .Rmd | .html |
Extra | GENESIS Model Explorer Tutorial | .Rmd | .html | |
10:00am-10:30am | Coffee Break | ||
10:30am-12:00pm | Leveraging Multi-Ancestry Data: Lecture | Slides | |
12:00pm-1:30pm | Lunch Break | ||
1:30pm-3:00pm | Leveraging Multi-Ancestry Data: LD Exercise Locus Zoom and Conditional Analysis Tutorials |
.docx | NEJM 2020 | Nature 2021 | KEY .Rmd | .html |
|
3:00pm-3:30pm | Coffee Break | ||
3:30pm-5:00pm | Variant Annotation: Part 1 Annotation Explorer Tutorial |
Slides | .Rmd | .html |
5:00pm-6:00pm | Tutorial Session |
Friday, June 14th
Time | Topic | Lecture | Tutorials/Exercises |
---|---|---|---|
8:30am-10:00am | Variant Annotation: Part 2 UCSC Genome Browser and FAVOR Tutorial |
Slides | .docx | chr16 SNPS | KEY |
10:00am-10:30am | Coffee Break | ||
10:30am-12:00pm | Multi-Variant Association Tests | Slides | .Rmd | .html |
12:00pm-1:30pm | Lunch Break | ||
1:30pm-3:00pm | STAAR | Slides | .Rmd | .html |
3:00pm-3:30pm | Coffee Break | ||
3:30pm-5:00pm | Recent Findings and Resources for WGS Analysis | Slides |
A detailed tutorial and relevant R scripts for STAAR pipeline are available at https://github.com/xihaoli/STAARpipeline-Tutorial.
If you are new to R, you might find the following material helpful:
- Introduction to R materials from SISG Module 3
- Graphics with ggplot2
- Data manipulation with dplyr