- Bioinformatics Training at HBC (Harvard Chan Bioinformatics Core)
- R for Data Science (Garrett Grolemund and Hadley Wickham)
- Illumina DRAGEN Bio-IT Platform
- QIAGEN OmicSoft
The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between the National Cancer Institute and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions.
Over the next dozen years, TCGA generated over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. The data, which has already lead to improvements in our ability to diagnose, treat, and prevent cancer, will remain publicly available for anyone in the research community to use.
The ICGC, established in 2007, aimed to define the genomes of 25,000 primary untreated cancers (the 25K Initiative). The ICGC solved numerous data governance, ethical and logistical challenges to make global genomic data sharing for cancer possible, providing the international community with comprehensive genomic data for many cancer types.
The CCLE (Cancer Cell Line Encyclopedia) project is a collaboration between the Broad Institute, and the Novartis Institutes for Biomedical Research and its Genomics Institute of the Novartis Research Foundation to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models, to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to genomic patterns and to translate cell line integrative genomics into cancer patient stratification. The CCLE provides public access to genomic data, analysis and visualization for over 1100 cell lines.
While many genetic variants have been associated with risk for human diseases, how these variants affect gene expression in various cell types remains largely unknown. To address this gap, the DICE (Database of Immune Cell Expression, Expression quantitative trait loci (eQTLs) and Epigenomics) project was established. Considering all human immune cell types and conditions studied, we identified cis-eQTLs for a total of 12,254 unique genes, which represent 61% of all protein-coding genes expressed in these cell types. Strikingly, a large fraction (41%) of these genes showed a strong cis-association with genotype only in a single cell type. We also found that biological sex is associated with major differences in immune cell gene expression in a highly cell-specific manner. These datasets will help reveal the effects of disease risk-associated genetic polymorphisms on specific immune cell types, providing mechanistic insights into how they might influence pathogenesis.
BLUEPRINT is a high impact FP7 project aiming to produce a blueprint of haemopoetic epigenomes. Our goal is to apply highly sophisticated functional genomics analysis on a clearly defined set of primarily human samples from healthy and diseased individuals, and to provide at least 100 reference epigenomes to the scientific community. This resource-generating activity will be complemented by research into blood-based diseases, including common leukaemias and autoimmune disease (Type 1 Diabetes), by discovery and validation of epigenetic markers for diagnostic use and by epigenetic target identification. This may eventually lead to the development of novel and more individualised medical treatments.
Gene expression data (Count and TPM): http://fantom.gsc.riken.jp/5/datafiles/latest/extra/gene_level_expression/
ImmuCo is a database of gene Co-expression and Correlation in Immune cells. The current version includes expression data for a total of 20,283 human and 20,963 mouse genes from the Affymetrix Human Genome U133 Plus 2.0 and Mouse Genome 430 2.0 microarrays, respectively, enabling co-expression and correlation analysis between any two genes. These arrays are from 11 human and 7 mouse cell types, including 18 human and 10 mouse cell groups. The signal values used for analysis were derived using the MAS 5.0 algorithm. Co-expression is reflected by the signal values and detection calls, whereas expression correlation and strength are reflected by a Pearson correlation coefficient. A scatter plot of the signal values is also provided to graphically illustrate the extent of correlation. ImmuCo supports requests using gene symbol (or alias), Entrez Gene ID, and probe set ID. Currently, 20,283 human and 20,963 mouse genes can be queried.
RefDIC is an open resource compendium of quantitative mRNA/Protein profile data obtained from microarray and 2DE-gel based proteome experiments specifically for immune cells. You can easily retrieve various aspects of mRNA/Protein profiles of immune cells from RefDIC. We will extend and update the contents of RefDIC on a regular basis to offer an information platform that fully exploits the power of genomics in immunology. This database has been constructed with continuous support, valuable research input and feedback comments from our colleagues at RIKEN Research Center for Allergy and Immunology (RCAI). We greatly appreciate their thoughtful collaboration on the RefDIC project.
We isolated twelve different types of human leukocytes from peripheral blood and bone marrow, treated them to induce activation and/or differentiation, and profiled their gene expression before and after treatment.
The twelve cell types are:
B cells, CD14+ cells, CD4+ CD45RO+ CD45RA- T cells, CD4+ T cells, CD8+ T cells, IgG/IgA memory B cells, IgM memory B cells, Monocytes, NK cells, Neutrophils, Plasma cells from bone marrow, and Plasma cells from PBMC.
Gene expresion across species and biological conditions (62 species, 3,699 studies, 121,342 assays)
Novel curated dataset comprising nearly all human public, primary bulk samples in the NCBI’s Sequence Read Archive
Thanks to technological advances in genomics, transcriptomics, proteomics, metabolomics, and related fields, projects that generate a large number of measurements of the properties of cells, tissues, model organisms, and patients are becoming commonplace in biomedical research. In addition, curation projects are making great progress mining biomedical literature to extract and aggregate decades worth of research findings into online databases. Such projects are generating a wealth of information that potentially can guide research toward novel biomedical discoveries and advances in healthcare. To facilitate access to and learning from biomedical Big Data, we created the Harmonizome: a collection of information about genes and proteins from 114 datasets provided by 66 online resources.
Search the multi-organism collection of genome wide gene expression data obtained from publicly available sources like GEO, ArrayExpress, and SRA. The data has been processed uniformly and normalized using a set of standardized pipelines curated by the Childhood Cancer Data Lab (CCDL)
Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced from multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data) and from multiple data sets (including GTEx data)
A database containing 1,126 well-annotated autoantigens, determined by text-mining and manual curation [10]. AAgAtlas database 1.0 provides a user-friendly interface to conveniently browse, retrieve and download the list of autoantigens and their associated diseases
Open Targets is an innovative, large-scale, multi-year, public-private partnership that uses human genetics and genomics data for systematic drug target identification and prioritisation.
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
VDJdb is a curated database of T-cell receptor (TCR) sequences with known antigen specificities. The primary goal of VDJdb is to facilitate access to existing information on T-cell receptor antigen specificities, i.e. the ability to recognize certain epitopes in a certain MHC contexts.
- nSolve
- NACHO