10. Gene Expression Analysis - HDFn - GSE84144.Rmd

---
title: "Gene Expression Analysis"
subtitle: "Human Dermal Fibroblasts, neonatal (HDFn) | GSE84144 | Necroptosis, Ferroptosis & Pyroptosis"
author: 
  - Mark Edward M. Gonzales^[De La Salle University, Manila, Philippines, gonzales.markedward@gmail.com]
  - Dr. Anish M.S. Shrestha^[De La Salle University, Manila, Philippines, anish.shrestha@dlsu.edu.ph]
output: html_notebook
---

## I. Preliminaries

### Loading libraries

```{r, warning=FALSE, message=FALSE}
library("tidyverse")
library("tibble")
library("msigdbr")
library("ggplot2")
library("ensembldb")
library("purrr")
library("magrittr")
library("matrixStats")
library("dplyr")
library("grex")
library("gplots")
library("RColorBrewer")
library("illuminaHumanv4.db")
```

### Constants
```{r}
DATA_DIR <- "data/HDFn/"
```

## Loading the Expression Data

The expression data are taken from this study: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84144

Download the RNA-seq normalized counts matrices (one matrix per replicate) from:
- https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2227697
- https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2227698
- https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2227699

```{r}
hdfn.expression1 <- read.delim(paste0(DATA_DIR, "GSE84144-hdfn-1.tsv"), as.is = TRUE, header = TRUE, row.names = 1)
hdfn.expression1 <- rownames_to_column(hdfn.expression1, "ID_REF")
hdfn.expression1 <- hdfn.expression1[c("ID_REF", "VALUE")]

hdfn.expression2 <- read.delim(paste0(DATA_DIR, "GSE84144-hdfn-2.tsv"), as.is = TRUE, header = TRUE, row.names = 1)
hdfn.expression2 <- rownames_to_column(hdfn.expression2, "ID_REF")
hdfn.expression2 <- hdfn.expression2[c("ID_REF", "VALUE")]

hdfn.expression3 <- read.delim(paste0(DATA_DIR, "GSE84144-hdfn-3.tsv"), as.is = TRUE, header = TRUE, row.names = 1)
hdfn.expression3 <- rownames_to_column(hdfn.expression3, "ID_REF")
hdfn.expression3 <- hdfn.expression3[c("ID_REF", "VALUE")]

hdfn.expression3
```

Merge the gene expression data for the three replicates into one data frame.

```{r}
hdfn.expression <- left_join(hdfn.expression1, hdfn.expression2, join_by(ID_REF))
hdfn.expression <- left_join(hdfn.expression, hdfn.expression3, join_by(ID_REF))
hdfn.expression <- hdfn.expression %>% rename("VALUE.x" = "Replicate 1", "VALUE.y" = "Replicate 2", "VALUE" = "Replicate 3")
hdfn.expression
```

Map the Illumina probe IDs to Ensembl accessions.

```{r}
illumina_to_ensembl = data.frame(gene_id=unlist(mget(x = hdfn.expression[["ID_REF"]], envir = illuminaHumanv4ENSEMBL)))
illumina_to_ensembl <- rownames_to_column(illumina_to_ensembl, "ID_REF")
illumina_to_ensembl
```

```{r}
hdfn.expression <- left_join(hdfn.expression, illumina_to_ensembl)
hdfn.expression
```

## Exploratory Data Analysis

We load the gene sets from RCDdb: https://pubmed.ncbi.nlm.nih.gov/39257527/

```{r}
RCDdb <- "data/RCDdb/"
```

### Necroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Necroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene) )
tpm.df <- tpm.df[ order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

```{r, fig.height=30, fig.width=10}
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix, srtCol=360, cellnote = tpm.matrix, dendrogram="none", Colv=FALSE, Rowv=FALSE,
          col=brewer.pal(n = 9, name = "BuPu")[5:9], trace="none", key = FALSE, lwid=c(0.1,4), lhei=c(0.1,4),
          cexCol=1, cexRow=0.75, symm = TRUE)
```
### Ferroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Ferroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene) )
tpm.df <- tpm.df[ order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

```{r, fig.height=150, fig.width=10}
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix, srtCol=360, cellnote = tpm.matrix, dendrogram="none", Colv=FALSE, Rowv=FALSE,
          col=brewer.pal(n = 9, name = "BuPu")[5:9], trace="none", key = FALSE, lwid=c(0.1,4), lhei=c(0.1,4),
          cexCol=1, cexRow=0.75, symm = TRUE)
```

### Pyroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Pyroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene) )
tpm.df <- tpm.df[ order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

```{r, fig.height=20, fig.width=10}
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix, srtCol=360, cellnote = tpm.matrix, dendrogram="none", Colv=FALSE, Rowv=FALSE,
          col=brewer.pal(n = 9, name = "BuPu")[5:9], trace="none", key = FALSE, lwid=c(0.1,4), lhei=c(0.1,4),
          cexCol=1, cexRow=0.75, symm = TRUE)
```