Open4Gene

Hurdle Model-based Method for Peak-to-Gene Linkage Analysis.

Main features of Open4Gene

Accounting for excess zeros in single-nucleus RNA data based on a two-component mixture Hurdle model
Modeling linkages between peak open chromatin (ATAC) and gene expression (RNA) using regression model with covariates
Flexible specification of analysis using cells from a given cell type of interest, each cell type or all cells

How to build & install in R (>= 4.1.0)

devtools::install_github("hbliu/Open4Gene")
library(Open4Gene)

Run Open4Gene with example data

Load example data (./Open4Gene/inst/extdata/Open4Gene.Test.Data)

load(system.file("extdata", "Open4Gene.Test.Data", package = "Open4Gene"))

ls()
# "ATAC.Counts", "RNA.Counts", "Gene.Annotation", "Meta.Data", "Peak.Gene"

head(Peak.Gene)
#Peak                   Gene
#chr5-39400433-39402082 DAB2
#chr5-39400433-39402082 DAB2

Preparing the object for Open4Gene analysis

Open4Gene.obj <- CreateOpen4GeneObj(RNA = RNA.Counts,
                                    ATAC = ATAC.Counts,
                                    Meta.data = Meta.Data,
                                    Peak2Gene.Pairs = Peak.Gene,
                                    Covariates = c("lognCount_RNA","percent.mt"),
                                    Celltypes = "Cell_Type")

Run Open4Gene analysis

Open4Gene.obj <- Open4Gene(object = Open4Gene.obj,
                          Celltype = "All", # Other options: Cell type name, e.g., "PT"; or "Each" to analyze each cell type
                          Binary = FALSE,   # Binarize ATAC data if binary = TRUE
                          MinNum.Cells = 5)   # Minimal number of cells with both RNA > 0 and ATAC > 0 for association test

Output Open4Gene result

write.table(Open4Gene.obj@Res, file = "Open4Gene.obj.All.res.txt", sep="\t", col.names=TRUE, row.names=FALSE, quote=FALSE)

Output

On output, Open4Gene.obj@Res provides the following values for each gene~peak pair:

Peak
Gene
Celltype (Cell type name, "All" means all cell in the input data)
TotalCellNum (Total number of cells used for the analysis)
ExpressCellNum (Number of cells expressing given gene, RNA read count > 1)
OpenCellNum (Number of cells with open chromatin in given peak, ATAC read count > 1)
hurdle.res.zero.beta (beta value of zero hurdle model, binomial with logit link) *
hurdle.res.zero.se (standard error value of zero hurdle model)
hurdle.res.zero.z (z value of zero hurdle model)
hurdle.res.zero.p (p value of zero hurdle model) *
hurdle.res.count.beta (beta value of count model, truncated negbin with log link) *
hurdle.res.count.se (standard error value of count model)
hurdle.res.count.z (z value of count model)
hurdle.res.count.p (p value of count model) *
hurdle.AIC (Akaike information criterion of hurdle model)
hurdle.BIC (Bayesian information criterion of hurdle model)
spearman.rho (Spearman's rank correlation coefficient between RNA and ATAC)
spearman.p (Spearman's p value between RNA and ATAC)

You may need columns with (*) for the downstream analysis.

Input

This section describes how to prepare the input files for Open4Gene. Open4Gene needs following input data and parameters:

RNA [dgCMatrix] Sparse matrix of scRNAseq read count, gene in row and cell in column
ATAC [dgCMatrix] Sparse matrix of scATACseq read count, gene in row and cell in column
Meta.data [data.frame] Meta data table with covariates; with cell ID in the rownames
Covariates [character] Assign covariates that are needed for the analysis
Celltypes [character] Assign celltype column from Meta.data
Peak2Gene.Pairs [data.frame] Dataframe that contains Peak~Gene pairs to analyze using Open4Gene
Gene.Annotation [GRanges] Gene annotation, e.g. EnsDb.Hsapiens.v75
Peak2Gene.Dis [integer] Distance (peak to gene body), default is 100000 (bp)

Data preparation

1. RNA and ATAC read count matrix

Code for extracting count matrix and cell information from Seurat object with both RNA and ATAC assays:

RNA.Counts <- Seurat.object@assays$RNA@counts
ATAC.Counts <- Seurat.object@assays$ATAC@counts
Meta.Data <- Seurat.object@meta.data

Note here that, the cell IDs from different matrix should match with each other, e.g. columns of RNA matrix, columns of ATAC matrix, and rows of Meta.Data.

2. Meta.Data

This is a dataframe that contains cell information from the single cell RNA and ATAC, as following.

	orig.ident	lognCount_RNA	percent.mt	Cell_Type
HK2888_GTTTAACCAGCTCAAC-1	HK2888	6.61	5.70	PT
HK2888_GGCTAGTGTCATGCCC-1	HK2888	7.66	2.36	Injured_PT
HK2888_GGCTAGTGTAAGCTCA-1	HK2888	7.49	0.04	LOH

Note here that, the cell IDs should be provided in the rownames. Make and check it using

Meta.Data <- Seurat.object@meta.data
head(rownames(Meta.Data))

3. Covariates

The covariates are used for the regression analysis. Features of cells in metadata can be used as covariates, e.g. lognCount_RNA, percent.mt.

4. Peak2Gene.Pairs

This is a dataframe that contains Peak~Gene pairs for Open4Gene, Peak (first column) and Gene (second column), as following.

Peak	Gene
chr5-39400433-39402082	DAB2
chr5-39369336-39370159	DAB2

5. Preparing the object for Open4Gene analysis using Gene.Annotation

Open4Gene can pick up the Peak2Gene.Pairs based on peak and gene distance (Peak2Gene.Dis) based on the input data. Here, the Gene.Annotation is a gene annotation in GRanges object, e.g. EnsDb.Hsapiens.v75. Code for preparing object for Open4Gene analysis using Gene.Annotation of EnsDb.Hsapiens.v75.

library(EnsDb.Hsapiens.v75)
Gene.Annotation <- genes(EnsDb.Hsapiens.v75)
Open4Gene.obj <- CreateOpen4GeneObj(RNA = RNA.Counts, ATAC = ATAC.Counts, Meta.data = Meta.Data,
                            Gene.Annotation = Gene.Annotation,
                            Peak2Gene.Dis = 100000,
                            Covariates = c("lognCount_RNA","percent.mt"), 
                            Celltypes = "Cell_Type")

Then, Open4Gene.obj will be inputed to Run Open4Gene analysis.

Run Open4Gene

1. Run Open4Gene for a given cell type, e.g. PT

Open4Gene.obj <- Open4Gene(object = Open4Gene.obj,
                          Celltype = "PT",  
                          Binary = FALSE,
                          MinNum.Cells = 5)

2. Run Open4Gene for each cell type

Open4Gene.obj <- Open4Gene(object = Open4Gene.obj,
                          Celltype = "Each",  
                          Binary = FALSE,
                          MinNum.Cells = 5)

3. Run Open4Gene using all cells

Open4Gene.obj <- Open4Gene(object = Open4Gene.obj,
                          Celltype = "All",  
                          Binary = FALSE,
                          MinNum.Cells = 5)

Warning

It is time-consuming to run genome-wide peak-to-gene linkage analysis using Open4Gene. Open4Gene analysis on 3000 pairs takes about 5 hours. To perform genome-wide analysis, we recommend using Peak2Gene.Pairs to control the number of pairs analyzed in each chunk.

Contact

For any question, you are welcome to report your issue in Github or contact us [email protected] and [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
R		R
inst/extdata		inst/extdata
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open4Gene

Main features of Open4Gene

How to build & install in R (>= 4.1.0)

Run Open4Gene with example data

Output

Input

Data preparation

Run Open4Gene

Warning

Contact

About

Releases 1

Packages

Languages

hbliu/Open4Gene

Folders and files

Latest commit

History

Repository files navigation

Open4Gene

Main features of Open4Gene

How to build & install in R (>= 4.1.0)

Run Open4Gene with example data

Output

Input

Data preparation

Run Open4Gene

Warning

Contact

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages