-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction
Leukemia (the cancer of blood) has multiple types with varying degrees of malignancy and proliferation. MPAL (Mixed Phenotype Acute Leukumia) is a rare form and shares characteristics from other types of leukemia which makes it harder to differentiate between different types of leukemia and their associated therapies. As a result of these shared characteristics of MPAL between different leukemia types, it is pertinent that regulatory programs specific to MPAL are identified and studied.
An approach towards identifying these regulatory programs is to to complement data from different modalities, e.g. gene expression from RNA-sequencing, peak accessibility from ATAC-sequencing, cell-surface protein abundance from CITE-sequencing, and mutations from whole genome or exome sequencing.
- Import RNA gene expression data and peaks from ATAC-seq data
- For peaks, we compute gene activity scores so they can be brought to similar scale as the gene expression data for downstream analysis
- Both expression and peaks data are normalized and brought into reduced dimensions using iterative latent semantic indexing (lsi) separately
- Clustering is performed and cell-types are identified using marker genes
- Both RNA gene expression and peaks are similarly pre-processed as the healthy cells
UMAP is created and healthy hematopoiesis is validated using protein abundance cluster of differentiation (CD) markers including CD3D, CD14, CD19 and CD8A.
To identify compartments in the disease cells that may be healthy like, the disease cells are lsi reduced and then projected over the healthy umap
Compartments are identified as either the background cell-types if greater than 70% of the cells belong to healthy samples in that cluster or as disease-like if the same percentage of the cells belong to disease samples. This identifies compartments that are either healthy-like belonging to a particular cell-type or as disease-like.
In each of the compartments previously identified, differentially expressed genes between the background healthy cells and the overlaid disease cells are computed
KEGG pathways are tested in the differentially expressed genes using clusterProfiler
TFs from clusters that are conserved across samples are computed
These TFs are linked to genes using a peak-to-gene linkage method such as the cca