Link to the work-in-progress dashboard cna be found here: https://lstmemery.shinyapps.io/yeastomics/
Yeasts (Saccharomyces cerevisiae) are used in the production of some of the most cherished food choices (e.g. bakery, wine making, and beer brewing). There are many other biotechnology applications that use yeast such as pharmaceutical and biomass production.
Yeasts are great model organisms because of their simple and small genome consisting of approximately 6000 genes. As single celled organisms, they also make great models for transcriptome analyses as gene expression is homogenous.
As part of our Vancouver-based hackathon (hackseq19) project examined yeast transcriptome data scraped off of the web. The data consists of gene expression changes from yeast strains that have been treated with various stimuli such as heat, phenol lysis, ethanol treatment, etc. Gene expression was normalized to Transcript Per Million (TPM). As a team, we cleaned, analyzed, and communicated the results of our explorations through development of an interactive dashboard and blog to present our methodologies and code snippets.
The data in this project includes gene expression values for 92 yeast strains treated with various stimuli. RNA expression levels are normalized to TPM (transcripts per million), following a default normalization procedure. Data is stored in data
folder.
- The
SC_expression.csv
file contains gene expression of yeast strains in the experiments. - The
labels.csv
files pertain to gene validation status and molecular function (MF), cellular component (CC), and biological processes (BP) of those genes. - The
conditions_annotation.csv
file explain the yeast strains and experimental conditions.
This project is inspired by the open source yeast-omics dataset shared as a Kaggle competition. The original data can be found here and scraped off from here.
To unravel the genetic mechanisms involved in yeast stress adaptation, we built a visualization platform that allows scientists to explore transcriptome data interactively. The methods implemented to visualize data include dimensionality reduction strategies (Unsupervised and supervised clustering of transcriptome across all experimental conditions.
- Introduction to Data Science with R
- RNAseq Analysis tutorials in R and Differential expression Analysis
- Mastering Shiny
- Data Manipulation and Visualization using R
- RNAseq data analysis workflow
- Setting up reproducible projects in R and package management for reproducible R code and reproducibility good practices
- Yeast pathway analysis in R
- Additional resources from UCSF Data Science Initiative
- Package development, maintenance, documentation, and peer-review
- Shiny Apps for Transcriptome Visualizations
- RNAseq Shiny app options
- Limma/Glimma/edgeR analyses
- Bioinformatics data skills
Noushin Nabavi, Matthew Emery, Alexander Morin, Zuhaib Ahmed, Casey Engstrom, Sedat Demiriz, Shinta Thio, Saelin Bjornson, Siddharth Raghuvanshi, Chris Rider
Team collaborations in #p02-yeast channel in slack