diff --git a/README.md b/README.md index 3f2998c6..68aca99a 100644 --- a/README.md +++ b/README.md @@ -35,9 +35,9 @@ TA Office Hours Location: Zoom (see Slack) or the first floor of the [Molecular | 11 | Nov 3 | Phil Bradley | [Modeling/machine learning in Python (continued)](lectures/lecture11) | | 12 | Nov 8 | Rasi Subramaniam | [Data analysis using R/tidyverse](lectures/lecture12/) | | 13 | Nov 10 | Rasi Subramaniam | [Data analysis using R/tidyverse (continued)](lectures/lecture13/) | -| 14 | Nov 15 | Rasi Subramaniam | [Biological sequences and annotations in Bioconductor](lectures/lecture14/) | -| 15 | Nov 17 | Gavin Ha | [Introduction to sequencing data](lectures/lecture15/) | -| 16 | Nov 22 | Gavin Ha | [Genomic data in R](lectures/lecture16/) | +| 14 | Nov 15 | Gavin Ha | [Introduction to sequencing data](lectures/lecture14/) | +| 15 | Nov 17 | Gavin Ha | [Genomic data in R](lectures/lecture15/) | +| 16 | Nov 22 | Rasi Subramaniam | [Biological sequences and annotations in Bioconductor](lectures/lecture16/) | | 17 | Nov 29 | Maggie Russell | [Immune repertoire sequencing and analysis](lectures/lecture17/) | | 18 | Dec 1 | Manu Setty | [Single-cell RNA-seq analysis](lectures/lecture18/) | | 19 | Dec 6 | Manu Setty | [Single-cell RNA-seq analysis (continued)](lectures/lecture19/) | diff --git a/lectures/lecture15/MCB536_lecture15_IntroSeqData.pdf b/lectures/lecture14/MCB536_lecture15_IntroSeqData.pdf similarity index 100% rename from lectures/lecture15/MCB536_lecture15_IntroSeqData.pdf rename to lectures/lecture14/MCB536_lecture15_IntroSeqData.pdf diff --git a/lectures/lecture14/README.md b/lectures/lecture14/README.md index 96b2fe59..c6d80385 100644 --- a/lectures/lecture14/README.md +++ b/lectures/lecture14/README.md @@ -1,4 +1,30 @@ -# Lecture 14: Working with biological sequences and annotations using `Bioconductor` +# Lecture 15: Introduction to Sequencing Data Analysis +Now that we have a basic grasp of concepts surrounding data management, manipulation, and visualization, we're ready to start focusing on some of the more specialized data encountered in computational biology research. Sequencing of nucleic acids is almost ubiquitous in biological research. In this lecture, we will introduce some common resources for depositing and retrieving sequence data generated by consortium efforts and independent laboratories. We will introduce concepts and practical steps of querying, inspecting, and visualizing sequence data. Then, we will cover the types of genomic variation and common tools used to predict these from sequencing data. -We will learn how to work with DNA sequences and transcript annotations. Open [lecture14.ipynb](./lecture14.ipynb) in VSCode. Make sure to select the `kernel` for `R` so that you can execute `R` code. You should have already set this up following the software installation instructions [here](../../software/README.md). \ No newline at end of file +This lecture focuses on concepts surrounding genome sequence data and their associated workflows. This lecture will include demonstrations and student exercises. We will dive into details of sequencing data and formats, as well as outputs for specific sequencing analysis commands. There will also be materials included as a resource for your future reference. + +## Learning objectives + +- Identify common databases and file formats used for sequence data +- Describe the steps involved in processing and analyzing sequence data to predict different types of genomic variants +- Recognize common tools (databases and software) used to assess variation in genomic data + +## Class materials + +Outline of [content from the slides](MCB536_lecture15_IntroSeqData.pdf): + +1. Sequence data +- Databases and online resources for sequence data +- Learn the common sequence data file formats + +2. Tools for sequencing data +- Tools to query, inspect, visualize an aligned sequence file (demo + exercise) +- Learn the contents of sequence data files (demo + exercise) +- Learn to generate sequencing metrics and to process sequence data (demo + exercise) + +## Before the class + +1. Please be able to locate files for in-class exercises. Data and examples shown in `lecture 15` and [lecture 16](../lecture16/) are available on Fred Hutch filesystem at `/fh/fast/subramaniam_a/tfcb` and on [DropBox](https://www.dropbox.com/sh/zoitjnobgp7l7c2/AABBIpTQcNA4lWYOFnV5dlMKa?dl=0). For both `lecture 15` and `lecture 16`, you will need to download these files onto your laptop. + +2. Please install [Integrative Genomics Viewer (IGV)](https://software.broadinstitute.org/software/igv/). diff --git a/lectures/lecture16/Lecture16_GenomicData.Rmd b/lectures/lecture15/Lecture16_GenomicData.Rmd similarity index 100% rename from lectures/lecture16/Lecture16_GenomicData.Rmd rename to lectures/lecture15/Lecture16_GenomicData.Rmd diff --git a/lectures/lecture16/Lecture16_GenomicData.html b/lectures/lecture15/Lecture16_GenomicData.html similarity index 100% rename from lectures/lecture16/Lecture16_GenomicData.html rename to lectures/lecture15/Lecture16_GenomicData.html diff --git a/lectures/lecture16/Lecture16_Rsamtools.Rmd b/lectures/lecture15/Lecture16_Rsamtools.Rmd similarity index 100% rename from lectures/lecture16/Lecture16_Rsamtools.Rmd rename to lectures/lecture15/Lecture16_Rsamtools.Rmd diff --git a/lectures/lecture16/Lecture16_Rsamtools.html b/lectures/lecture15/Lecture16_Rsamtools.html similarity index 100% rename from lectures/lecture16/Lecture16_Rsamtools.html rename to lectures/lecture15/Lecture16_Rsamtools.html diff --git a/lectures/lecture16/Lecture16_VariantCalls.Rmd b/lectures/lecture15/Lecture16_VariantCalls.Rmd similarity index 100% rename from lectures/lecture16/Lecture16_VariantCalls.Rmd rename to lectures/lecture15/Lecture16_VariantCalls.Rmd diff --git a/lectures/lecture16/Lecture16_VariantCalls.html b/lectures/lecture15/Lecture16_VariantCalls.html similarity index 100% rename from lectures/lecture16/Lecture16_VariantCalls.html rename to lectures/lecture15/Lecture16_VariantCalls.html diff --git a/lectures/lecture16/MCB536_lecture16_GenomicDataInR.pdf b/lectures/lecture15/MCB536_lecture16_GenomicDataInR.pdf similarity index 100% rename from lectures/lecture16/MCB536_lecture16_GenomicDataInR.pdf rename to lectures/lecture15/MCB536_lecture16_GenomicDataInR.pdf diff --git a/lectures/lecture15/README.md b/lectures/lecture15/README.md index c6d80385..a4676096 100644 --- a/lectures/lecture15/README.md +++ b/lectures/lecture15/README.md @@ -1,30 +1,65 @@ -# Lecture 15: Introduction to Sequencing Data Analysis +# Lecture 16: Genomic data in R -Now that we have a basic grasp of concepts surrounding data management, manipulation, and visualization, we're ready to start focusing on some of the more specialized data encountered in computational biology research. Sequencing of nucleic acids is almost ubiquitous in biological research. In this lecture, we will introduce some common resources for depositing and retrieving sequence data generated by consortium efforts and independent laboratories. We will introduce concepts and practical steps of querying, inspecting, and visualizing sequence data. Then, we will cover the types of genomic variation and common tools used to predict these from sequencing data. +This lecture will unite the last lecture's content on genomic analysis with our previous coding in R. The packages we'll use this week are from [Bioconductor](http://bioconductor.org), a collection of software specifically designed for genomic analysis in R. -This lecture focuses on concepts surrounding genome sequence data and their associated workflows. This lecture will include demonstrations and student exercises. We will dive into details of sequencing data and formats, as well as outputs for specific sequencing analysis commands. There will also be materials included as a resource for your future reference. +## Lecture Notes +[Lecture slides](./MCB536_lecture16_GenomicDataInR.pdf) ## Learning objectives -- Identify common databases and file formats used for sequence data -- Describe the steps involved in processing and analyzing sequence data to predict different types of genomic variants -- Recognize common tools (databases and software) used to assess variation in genomic data +Genome variant analysis (Background) +- Types of genomic variation +- Tools to predict genomic variations +- Learn the common file formats for variation data +- Databases and online resources for human variation data -## Class materials +Genomic Data (hands-on tutorials) +- Use Bioconductor packages to work with genomic data in R +- Load, inspect, and query genomic data (BED/SEG, BAM, VCF files) +- Identify and annotate genomic variants -Outline of [content from the slides](MCB536_lecture15_IntroSeqData.pdf): +## Before the class -1. Sequence data -- Databases and online resources for sequence data -- Learn the common sequence data file formats +We will be working through some tutorials directly on your laptop using R Studio. -2. Tools for sequencing data -- Tools to query, inspect, visualize an aligned sequence file (demo + exercise) -- Learn the contents of sequence data files (demo + exercise) -- Learn to generate sequencing metrics and to process sequence data (demo + exercise) +### 1. Install the R packages -## Before the class +- Tutorial is tested for R-4.0.3 +- You should run [this script in VSCode](../../software/genomic_data.R) to ensure all Bioconductor packages are installed. + ``` + ## start R session ## + R + ## run this command within R session ## + source("../../software/genomic_data.R") + ``` +- This script will install the following packages: + - `Rsamtools`: querying BAM files + - `VariantAnnotation`: reading VCF files + - `GenomicRanges`: manipulating genomic data + - `plyranges`: fast & easy tool for mannipulating GRanges + +### 2. Class materials: R Markdown files containing the tutorials + +- If you have not done so already, update your local copy of the class repository from GitHub. You should have a directory (`lecture16`) containing the following three RMarkdown tutorials: + - [Lecture16_GenomicData.Rmd](Lecture16_GenomicData.Rmd): store genomic data as objects, assess genomic ranges, apply operations on genomic data + - [Lecture16_Rsamtools.Rmd](Lecture16_Rsamtools.Rmd): load and query sequencing data; compute “pile-up” statistics at genomic loci to identify genomic variants + - [Lecture16_VariantCalls.Rmd](Lecture16_VariantCalls.Rmd): load and assess variant (vcf) data + +### 3. Install R Extension in VSCode + +- `Extensions` (on left panel) > Type in search bar: `"R Extension"` > Select `R Extension for Visual Studio Code` by Yuki Ueda +- The extension page should look something like this: https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r + +### 4. Install `pandoc` -1. Please be able to locate files for in-class exercises. Data and examples shown in `lecture 15` and [lecture 16](../lecture16/) are available on Fred Hutch filesystem at `/fh/fast/subramaniam_a/tfcb` and on [DropBox](https://www.dropbox.com/sh/zoitjnobgp7l7c2/AABBIpTQcNA4lWYOFnV5dlMKa?dl=0). For both `lecture 15` and `lecture 16`, you will need to download these files onto your laptop. +- To knit R Markdown files, you'll need the R Extension as well as `pandoc`. +- Install `pandoc` outside of VScode by downloading the installer here: https://pandoc.org/installing.html + +### 5. Class materials: Genomic and sequencing data for the tutorials -2. Please install [Integrative Genomics Viewer (IGV)](https://software.broadinstitute.org/software/igv/). +- Please download all data files found in [this folder](https://www.dropbox.com/sh/zoitjnobgp7l7c2/AABBIpTQcNA4lWYOFnV5dlMKa?dl=0) and add them to your `lecture16` directory. The files should have the following filenames: + - `BRCA.genome_wide_snp_6_broad_Level_3_scna.seg` + - `BRCA_IDC_cfDNA.bam` + - `BRCA_IDC_cfDNA.bam.bai` + - `GIAB_highconf_v.3.3.2.vcf.gz` (if this file was automatically uncompressed on your computer, resulting in a file named `GIAB_highconf_v.3.3.2.vcf`, look in your Trash folder to find the original file ending in `gz`) + - `GIAB_highconf_v.3.3.2.vcf.gz.tbi` diff --git a/lectures/lecture16/README.md b/lectures/lecture16/README.md index a4676096..96b2fe59 100644 --- a/lectures/lecture16/README.md +++ b/lectures/lecture16/README.md @@ -1,65 +1,4 @@ -# Lecture 16: Genomic data in R +# Lecture 14: Working with biological sequences and annotations using `Bioconductor` -This lecture will unite the last lecture's content on genomic analysis with our previous coding in R. The packages we'll use this week are from [Bioconductor](http://bioconductor.org), a collection of software specifically designed for genomic analysis in R. -## Lecture Notes -[Lecture slides](./MCB536_lecture16_GenomicDataInR.pdf) - -## Learning objectives - -Genome variant analysis (Background) -- Types of genomic variation -- Tools to predict genomic variations -- Learn the common file formats for variation data -- Databases and online resources for human variation data - -Genomic Data (hands-on tutorials) -- Use Bioconductor packages to work with genomic data in R -- Load, inspect, and query genomic data (BED/SEG, BAM, VCF files) -- Identify and annotate genomic variants - -## Before the class - -We will be working through some tutorials directly on your laptop using R Studio. - -### 1. Install the R packages - -- Tutorial is tested for R-4.0.3 -- You should run [this script in VSCode](../../software/genomic_data.R) to ensure all Bioconductor packages are installed. - ``` - ## start R session ## - R - ## run this command within R session ## - source("../../software/genomic_data.R") - ``` -- This script will install the following packages: - - `Rsamtools`: querying BAM files - - `VariantAnnotation`: reading VCF files - - `GenomicRanges`: manipulating genomic data - - `plyranges`: fast & easy tool for mannipulating GRanges - -### 2. Class materials: R Markdown files containing the tutorials - -- If you have not done so already, update your local copy of the class repository from GitHub. You should have a directory (`lecture16`) containing the following three RMarkdown tutorials: - - [Lecture16_GenomicData.Rmd](Lecture16_GenomicData.Rmd): store genomic data as objects, assess genomic ranges, apply operations on genomic data - - [Lecture16_Rsamtools.Rmd](Lecture16_Rsamtools.Rmd): load and query sequencing data; compute “pile-up” statistics at genomic loci to identify genomic variants - - [Lecture16_VariantCalls.Rmd](Lecture16_VariantCalls.Rmd): load and assess variant (vcf) data - -### 3. Install R Extension in VSCode - -- `Extensions` (on left panel) > Type in search bar: `"R Extension"` > Select `R Extension for Visual Studio Code` by Yuki Ueda -- The extension page should look something like this: https://marketplace.visualstudio.com/items?itemName=Ikuyadeu.r - -### 4. Install `pandoc` - -- To knit R Markdown files, you'll need the R Extension as well as `pandoc`. -- Install `pandoc` outside of VScode by downloading the installer here: https://pandoc.org/installing.html - -### 5. Class materials: Genomic and sequencing data for the tutorials - -- Please download all data files found in [this folder](https://www.dropbox.com/sh/zoitjnobgp7l7c2/AABBIpTQcNA4lWYOFnV5dlMKa?dl=0) and add them to your `lecture16` directory. The files should have the following filenames: - - `BRCA.genome_wide_snp_6_broad_Level_3_scna.seg` - - `BRCA_IDC_cfDNA.bam` - - `BRCA_IDC_cfDNA.bam.bai` - - `GIAB_highconf_v.3.3.2.vcf.gz` (if this file was automatically uncompressed on your computer, resulting in a file named `GIAB_highconf_v.3.3.2.vcf`, look in your Trash folder to find the original file ending in `gz`) - - `GIAB_highconf_v.3.3.2.vcf.gz.tbi` +We will learn how to work with DNA sequences and transcript annotations. Open [lecture14.ipynb](./lecture14.ipynb) in VSCode. Make sure to select the `kernel` for `R` so that you can execute `R` code. You should have already set this up following the software installation instructions [here](../../software/README.md). \ No newline at end of file diff --git a/lectures/lecture14/data/tumor_suppresssors.fasta b/lectures/lecture16/data/tumor_suppresssors.fasta similarity index 100% rename from lectures/lecture14/data/tumor_suppresssors.fasta rename to lectures/lecture16/data/tumor_suppresssors.fasta diff --git a/lectures/lecture14/data/tumor_suppresssors.gff3 b/lectures/lecture16/data/tumor_suppresssors.gff3 similarity index 100% rename from lectures/lecture14/data/tumor_suppresssors.gff3 rename to lectures/lecture16/data/tumor_suppresssors.gff3 diff --git a/lectures/lecture14/lecture14.ipynb b/lectures/lecture16/lecture14.ipynb similarity index 100% rename from lectures/lecture14/lecture14.ipynb rename to lectures/lecture16/lecture14.ipynb