Skip to content

Introduction

Irzam Sarfraz edited this page Jun 16, 2023 · 4 revisions

Motivation

Leukemia (the cancer of blood) has multiple types with varying degrees of malignancy and proliferation. MPAL (Mixed Phenotype Acute Leukumia) is a rare form and shares characteristics from other types of leukemia which makes it harder to differentiate between different types of leukemia and their associated therapies. As a result of these shared characteristics of MPAL between different leukemia types, it is pertinent that regulatory programs specific to MPAL are identified and studied.

An approach towards identifying these regulatory programs is to to complement data from different modalities, e.g. gene expression from RNA-sequencing, peak accessibility from ATAC-sequencing, cell-surface protein abundance from CITE-sequencing, and mutations from whole genome or exome sequencing.

Proposed workflow

1. Establish healthy reference (hematopoiesis)

For healthy cells:

  1. Import RNA gene expression data and peaks from ATAC-seq data
  2. For peaks, we compute gene activity scores so they can be brought to similar scale as the gene expression data for downstream analysis
  3. Both expression and peaks data are normalized and brought into reduced dimensions using iterative latent semantic indexing (lsi) separately
  4. Clustering is performed and cell-types are identified using marker genes

For MPAL cells:

  1. Both RNA gene expression and peaks are similarly pre-processed as the healthy cells

2. Validate healthy hematopoiesis

UMAP is created and healthy hematopoiesis is validated using protein abundance cluster of differentiation (CD) markers including CD3D, CD14, CD19 and CD8A.

3. Project disease over healthy

To identify compartments in the disease cells that may be healthy like, the disease cells are lsi reduced and then projected over the healthy umap

4. Identify developmental compartments from disease

Compartments are identified as either the background cell-types if greater than 70% of the cells belong to healthy samples in that cluster or as disease-like if the same percentage of the cells belong to disease samples. This identifies compartments that are either healthy-like belonging to a particular cell-type or as disease-like.

5. Validate projection technique

6. Pathogenic differential gene expression

In each of the compartments previously identified, differentially expressed genes between the background healthy cells and the overlaid disease cells are computed

7. Pathway enrichment analysis

KEGG pathways are tested in the differentially expressed genes using clusterProfiler

8. Identify transcription factors from conserved peaks

TFs from clusters that are conserved across samples are computed

9. Link transcription factors to genes

These TFs are linked to genes using a peak-to-gene linkage method such as the cca