Cell Decoder: Decoding cell identity with multi-scale explainable deep learning

Package: `celldecoder`

We created the python package called celldecoder that that decoding cell identity from gene expressions by explicitly modeling the multi-scale biological interactions, i.e., genes, pathways, and biological processes.

Requirements

Python >= 3.8
torch == 2.0.1
torch-geometric == 2.3.1
CUDA 11.7

Create environment

conda create -n celldecoder python=3.8
conda activate celldecoder
conda install pytorch=2.0.1 cudatoolkit=11.7 -c pytorch
pip install torch_geometric
pip install torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

Installation

The celldecoder python package is in the folder celldecoder. You can simply install it from the root of this repository using

pip install .

Environment has been tested

environment.yml

Usage

Step 1: Constructing the model

Domain knowledge

Building the model requires the use of prior knowledge. Ensure the creation of 2 directories at the software's location: /data/ppi and /data/reactome. The corresponding data can be downloaded from https://figshare.com/articles/dataset/PPI_data/24921831.

hierarchy: gene-pathway mapping & hierarchy pathway information
ppi: protein-protein interactions network

from celldecoder.data import interactions,hierarchy
import json
#hierarchy
n_layers = 3 # n layers of the model
reactome = hierarchy.hierarchy_layer(species='HSA') #HSA: human,MMU:mouse  
layers = reactome.get_layers(n_levels=n_layers)
ref_adata.uns['hierarchy'] = json.dumps(layers) 
query_adata.uns['hierarchy'] = json.dumps(layers)
#ppi
ref_adata = interactions.data_mapping_ppi(ref_adata,ppi_data) #ppi_data: ppi network

Input:

data: an AnnData object of reference data and query data （checkout reference and query have the same feature）
ppi_data : pre-prepared ppi networks data

Domain knowledge files
./data/ppi: human & mouse processed ppi data
./data/reactome: hierarchy pathway information

Step 2: Training the model

import celldecoder
dataset = "./data/hBone/hBone_ref_adata.h5ad"
device_id = 1
log_dir = f"./log/{dataset}"
# Train the cell decoder using the specified dataset `dataset` on the device `device_id`,
# logging to the specified directory `log_dir`, and using the `cell_label` parameter to specify the cell type label.
celldecoder.train(dataset=dataset, device_id=device_id, log_dir=log_dir, cell_label="cell_type")

Input:

dataset: dataset name
log_dir : logging directory
device_id: gpu device id

See other arguments in celldecoder/config.py

Output:

./log_dir/args.json : configuration file
./log_dir/best.pth : best checkpoint weights

See command output for validation metrics.

Step 3: Prediect by the model

import celldecoder
device_id = 1
log_dir = f"./log/hBone"
dataset = "./data/hBone/hBone_query_adata.h5ad"
fn_process = "processed-test"
predict_type = 'cell'
cells = celldecoder.predict(dataset = dataset, device_id = device_id ,log_dir = log_dir, fn_process = fn_process, predict_type = predict_type)

Input:

dataset: dataset name
log_dir : logging directory
device_id: gpu device id

Output:

See command output for test metrics.

Step 4: Output embeddings by the model

import celldecoder
device_id = 1
log_dir = f"./log/hBone"
dataset = "./data/hBone/hBone_query_adata.h5ad"
fn_process = "processed-test"
celldecoder.embed(dataset = dataset, device_id = device_id ,log_dir = log_dir, out_embed = "output", fn_process = fn_process)

Input:

dataset: dataset name
log_dir : logging directory
device_id: gpu device id

Output:

Embeddings of the cells

Step 5: Output feature explanations by the model

import celldecoder
device_id = 1
log_dir = f"./log/hBone"
dataset = "./data/hBone/hBone_query_adata.h5ad"
fn_process = "processed-test"
celldecoder.explain_feature(dataset = dataset, device_id = device_id ,log_dir = log_dir, explain_method = "grad", fn_process = fn_process)

Input:

dataset: dataset name
log_dir : logging directory
device_id: gpu device id

Output:

Feature explanations of the cells

Step 6: Output ppi explanations by the model

import celldecoder
device_id = 1
log_dir = f"./log/hBone"
dataset = "./data/hBone/hBone_query_adata.h5ad"
fn_process = "processed-test"
exp_dict ={
    "correlation": 0,
    "multi_atten": 1,
    "train_sample_gt": 0,
    "ce_loss_gt": 0,
    "exp_train_epochs": 100,
    "exp_lr": 0.01,
}
celldecoder.explain_ppi(dataset = dataset, device_id = device_id ,log_dir = log_dir, fn_process = fn_process, **exp_dict)

Input:

dataset: dataset name
log_dir : logging directory
device_id: gpu device id

Output:

PPI explanations of the cells

Example Demo:

Guided Tutorial

Cite Cell Decoder:

Zhu et al. Decoding cell identity with multi-scale explainable deep learning. bioRxiv 2024.02.05.578922

[https://doi.org/10.1101/2024.02.05.578922]

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
celldecoder		celldecoder
resource		resource
test		test
LICENSE		LICENSE
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cell Decoder: Decoding cell identity with multi-scale explainable deep learning

Package: `celldecoder`

Requirements

Create environment

Installation

Environment has been tested

Usage

Step 1: Constructing the model

Domain knowledge

Input:

Domain knowledge files

Step 2: Training the model

Input:

Output:

Step 3: Prediect by the model

Input:

Output:

Step 4: Output embeddings by the model

Input:

Output:

Step 5: Output feature explanations by the model

Input:

Output:

Step 6: Output ppi explanations by the model

Input:

Output:

Example Demo:

Cite Cell Decoder:

About

Releases 1

Packages

Contributors 3

Languages

License

PHOENIXcenter/CellDecoder

Folders and files

Latest commit

History

Repository files navigation

Cell Decoder: Decoding cell identity with multi-scale explainable deep learning

Package: celldecoder

Requirements

Create environment

Installation

Environment has been tested

Usage

Step 1: Constructing the model

Domain knowledge

Input:

Domain knowledge files

Step 2: Training the model

Input:

Output:

Step 3: Prediect by the model

Input:

Output:

Step 4: Output embeddings by the model

Input:

Output:

Step 5: Output feature explanations by the model

Input:

Output:

Step 6: Output ppi explanations by the model

Input:

Output:

Example Demo:

Cite Cell Decoder:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Package: `celldecoder`

Packages