-
Notifications
You must be signed in to change notification settings - Fork 14
Model Terminology
Sean edited this page Aug 31, 2018
·
2 revisions
The celda package is a reference implementation for several Bayesian hierarchical models useful for clustering single cell RNA-seq data. Our group uses specific terminology in reference to the data being modeled and to parts of the models themselves, which is outlined below. Issues, questions, and code contributions should use the terminology below:
- Cell Population: a specific cluster of cells; one cell cluster label amongst those returned from celda_C / celda_CG
- Gene Module: a specific cluster of genes; one gene cluster amongst those returned from celda_G / celda_CG
We use various shorthand terms in the code which implements the celda models. An explanation of each is below:
- C = Cell
- S or s = Sample
- G = Gene
- TS = Transcriptional State
- CP = Cell population
- n = counts of transcripts
- m = counts of cells
- K = Total number of cell populations
- L = Total number of transcriptional states
- nM = Number of cells
- nG = Number of genes
- nS = Number of samples
All n.* variables contain counts of transcripts
- n.CP.by.TS = Number of counts in each Cellular Population per Transcriptional State
- n.TS.by.C = Number of counts in each Transcriptional State per Cell
- n.CP.by.G = Number of counts in each Cellular Population per Gene
- n.by.G = Number of counts per gene (i.e. rowSums)
- n.by.TS = Number of counts per Transcriptional State All m.* variables contain counts of cells
- m.CP.by.S = Number of cells in each Cellular Population per Sample
- nG.by.TS = Number of genes in each Transcriptional State