-
Notifications
You must be signed in to change notification settings - Fork 1
/
pull_esets.Rmd
127 lines (107 loc) · 4.44 KB
/
pull_esets.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Create Esets"
author: "Evan Henrich and Helen Miller"
output:
html_document:
toc: true
toc_float: true
df_print: paged
params:
outputDir: "/share/files/HIPC/IS2/@files/data/html_outputs"
dataCacheDir: "/share/files/HIPC/IS2/@files/data"
timestamp: ""
---
# Overview
The purpose of this vignette is to pull all expressionsets from the ImmuneSpace portal, www.immunespace.org and save as an R object for later processing.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, include = TRUE)
suppressPackageStartupMessages({
library(ImmuneSignatures2) # vaccine map loaded as `vaccines`
library(ImmuneSpaceR)
library(Biobase)
library(data.table)
})
# Output variables
outputDir <- params$outputDir
dataCacheDir <- params$dataCacheDir
if (!dir.exists(outputDir)) dir.create(outputDir, recursive = TRUE)
if (!dir.exists(dataCacheDir)) dir.create(dataCacheDir)
timeStamp <- params$timestamp
```
```{r create-connection}
con <- CreateConnection("IS2", onTest = FALSE)
```
# Load meta data
Load metadata about samples and participants.
```{r load-meta-data}
demographics <- con$getDataset("demographics")
geneExpressionFiles <- con$getDataset("gene_expression_files", original_view = TRUE)
featureAnnotationMap <- getTable(con, "microarray", "fasMap", showHidden = TRUE)
featureAnnotation <- getTable(con, "microarray", "FeatureAnnotationSet", showHidden = TRUE)
```
```{r map-meta-data-shared-GE-and-response}
sharedMetaDataFile <- file.path(dataCacheDir, paste0(timeStamp, "sharedMetaData.rds"))
sharedMetaData <- addStudy(demographics)
sharedMetaData <- addArmAccession(sharedMetaData, geneExpressionFiles)
sharedMetaData <- addVaccineFields(sharedMetaData, vaccines)
sharedMetaData <- filterOutNoVaccineSamples(sharedMetaData)
sharedMetaData <- addGeBatchName(sharedMetaData)
sharedMetaData <- addIrpBatchName(sharedMetaData)
sharedMetaData <- addSDY1325metadata(sharedMetaData)
sharedMetaData <- imputeAge(sharedMetaData)
saveRDS(sharedMetaData, file = sharedMetaDataFile)
write_data_metadata(file.path(dataCacheDir, "dataset_metadata.csv"),
data_path = sharedMetaDataFile,
dataset_name = "sharedMetaData.rds")
```
# Immune Response Data Retrieval
```{r prepare-immune-response-data}
immdata_filename <- file.path(dataCacheDir, paste0(timeStamp, "immdata_all.rds"))
assays <- c("hai", "neut_ab_titer", "elisa")
immdata_all <- sapply(assays, USE.NAMES = TRUE, function(assay){
dt <- con$getDataset(assay, original_view = TRUE)
dt <- dt[, -"lsid"]
if(assay == "elisa"){
dt <- rbind(dt, sdy1370_elisa)
}
dt$assay <- assay
dt <- correctHrs(dt)
dt <- createUniqueIdColumn(dt)
dt <- merge(dt, sharedMetaData, by = c("participant_id", "study_accession", "arm_accession"))
})
saveRDS(immdata_all, file = immdata_filename)
write_data_metadata(file.path(dataCacheDir, "dataset_metadata.csv"),
data_path = immdata_filename,
dataset_name = "immdata_all.rds")
```
# Gene Expression Data Retrieval
Gene expression matrices in ImmuneSpace are created on a cohort*cell_type basis
and each matrix is quantile normalized and log-transformed separately. The .rds file that contains the list of expressionSets needed for downstream analysis is approximately 1GB and is therefore cached.
```{r extract-within-study-normalized-gene-expression-data}
geMatrices <- con$cache$GE_matrices
# Removing cohorts from gene expression data:
# SDY1370 - BCell and TCell, since others are PBMC / WholeBlood
# SDY1325 - lowdose and subcutaenous PS, different vaccine method not related
# SDY1364 - intraDermal, different vaccination method
# SDY180 - Saline cohorts did not receive stimulation
rmCohorts <- "cell|Subcutaneous|LowIntraMuscular|IntraDermal|Saline"
geMatrices <- geMatrices[ grep( rmCohorts, geMatrices$name, invert = TRUE), ]
esetsFile <- file.path(dataCacheDir, paste0(timeStamp, "IS2_esets.rds"))
esets <- lapply(
geMatrices$name,
con$getGEMatrix,
outputType = "normalized",
annotation = "latest")
names(esets) <- geMatrices$name
saveRDS(esets, file = esetsFile)
write_data_metadata(file.path(dataCacheDir, "dataset_metadata.csv"),
data_path = esetsFile,
dataset_name = "IS2_esets.rds")
```
There are `r nrow(geMatrices)` matrices in the IS2 virtual study.
```{r check-extracted-ge-data}
results <- testExtractedGEData(esets)
if( !all(unlist(results)) ){
stop("Normalized matrices do not meet dim and NA value expectations")
}
```