-
Notifications
You must be signed in to change notification settings - Fork 14
Output Data Structure
Sean edited this page Mar 28, 2017
·
3 revisions
Specifically, for the cell-gene clustering model:
Variable | Description |
---|---|
z | Gene cluster assignments from the final iteration of Gibbs sampling |
complete.z | History of gene cluster assignments across all iterations of Gibbs |
z.stability | [0,1] measure of stability for the gene clustering chain |
complete.z.stability | History of z.stability over all iterations of Gibbs sampling |
z.prob | Probability of each cluster assignment |
y | Cell cluster assignments from the final iteration of Gibbs sampling |
complete.y | History of cell cluster assignments across all iterations of Gibbs |
y.stability | [0,1] measure of stability for the cell clustering chain |
complete.y.stability | Historyof y.stability over all iterations of Gibbs sampling |
completeLogLik | Log-likelihood of all gene and cell cluster assignments over all iterations of Gibbs sampling |
finalLogLik | Log-likelihood of final gene and cell cluster assignments |
Ideally, what we should get back should contain a list of mcmclist of mcmc objects. Each list item is an mcmclist for one of the models that was run (e.g. k=4, l=10 versus k=5, l=10), where the mcmc objects in the mcmc contain information on the Gibbs sampling for that chain in that model (phew!). It should also contain information on the parameters used to run celda, as well as additional information on model performance (the complete log likelihood, etc).
There's a couple gotchas to note off the bat that will make designing a well-behaving data structure annoying:
- Different models have different outputs. The gene, cell, and gene*cell clustering models all behave differently, with the last one returning information on clustering in gene and cell space. I'm not 100% on the best way for the output to be structured given that.