You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
obs:
index # cell idsamplecell_type # human-readable nameorganism # ?tissue # ?mod:
# gene expressionrna:
layers:
countsvelocity_spliced # ?velocity_unspliced # ? var:
index # feature_id, preferably an ensembl idfeature_name# Antibody Captureprot:
layers:
countsvar:
index # feature_idfeature_name # Associated protein names# IR receptor datavdj:
obsm:
vdj_tvdj_b# Custom Capturecustom:
X: # raw countsuns:
sample_info: # dictionary of data frames, every data frame has a 'sample_id' columncellranger: h5attributes(h5)qc: # Data frame with columns:sample_id # corresponds to .obs["sample_id"]component_id # the component that generated these qc values, e.g. mapping/cellranger_countcategory # 10x example [Cells, Library], BD example [Sequencing Quality, Library Quality, ...]group_name # example 'ABC_1'metric_namemetric_value # numerical values, example 1000, 0.1 -- strip % signsparam_log: # list of dicts
- pipeline_idcomponent_idcomponent_versionidparams: { input: ..., output: ..., arg1: ..., arg2: ... } # not the full path of files should be stored, only the base names
After single sample RNA
mod:
rna:
obs:
doublet_probdoublet_scoredoublet_bool<standard names for scanpy calculate qc metrics>var:
<standard names for scanpy calculate qc metrics>layers:
ambient_corrected_counts
mod:
rna:
obs:
clusterobsm:
X_pcaX_integratedX_umapobsp:
connectivitiesdistancesuns:
neighbors: # for compatibility with umapconnectivities_keydistances_keyparams: { ... }
After annotation
Since it could be used across modalities, so should be able to output in the root of the mudata.
obsm:
annotation_scvi: # data frame with the predictions and scores?annotation_bbknn: # data frame# all in one: with just the predictions?annotation:
prediction_scviprediction_bbknn...uns:
...?
WIP!
Logging QC metrics
uns:
sample_info: # dictionary of data frames, every data frame has a 'sample_id' columncellranger: h5attributes(h5)qc: # Data frame with columns:sample_id # corresponds to .obs["sample_id"]component_id # the component that generated these qc values, e.g. mapping/cellranger_countcategory # 10x example [Cells, Library], BD example [Sequencing Quality, Library Quality, ...]group_name # example 'ABC_1'metric_namemetric_value # numerical values, example 1000, 0.1 -- strip % signs
Logging execution
uns:
param_log: # list of dicts
- pipeline_idcomponent_idcomponent_versionidparams: { input: ..., output: ..., arg1: ..., arg2: ... } # not the full path of files should be stored, only the base names
The text was updated successfully, but these errors were encountered:
This is a first attempt at deriving a data format specification.
Once we figure out some of the APIs, we could include these in our config.vsh.yaml definitions (similar to https://github.com/openproblems-bio/openproblems-v2/tree/main/src/label_projection/api )
After Cell Ranger or BD Rhapsody mapping
After single sample RNA
After multi sample RNA
New fields:
After integration RNA
New fields:
After annotation
Since it could be used across modalities, so should be able to output in the root of the mudata.
WIP!
Logging QC metrics
Logging execution
The text was updated successfully, but these errors were encountered: