SCMMIB project provided a benchmark workflow for evaluating the usability, accuracy, robustness and scalability of single-cell multimodal integration algorithms, including 65 single-cell multi-modal integration methods in 40 algorithms involving modalities of DNA, RNA, protein and spatial multi-omics for paired integration, unpaired diagonal integration, and unpaired mosaic integration.
-
This folder contains scmmib package for computing SCMMIB benchmark evaluation metrics, as well as the benchmark datasets and benchmark algorithms.
-
And figure reproducibility code for manuscript figures in stage 2 project. The scripts and datasets analyzed in stage 1 are archived here. The registered stage 1 manuscript is available at Nature Methods Register Report Figshare.
-
Our website for benchmark results visualization is available at SCMMIB_website.
-
For reproducibility of benchmark methods, metrics and visualization, we had a GitHub repository at SCMMIB_pipeline.
We developed a python package scmmib
based on scanpy
pipeline, which referred to some integration metrics in scib
and scglue
package, and extended to different single-cell multimodal integration tasks.
The knn_smooth
function in scmmib package was sourced from a public knn smoothing method:
knn_smoothing paper, and github.
User tutorial and api documentations can be found in an online document: (https://scmmib.readthedocs.io/en/latest/).
scmmib
package also includes a simplified summary visualization tool plot_scmmib_table.r
in R.
- Python >=3.8,
scib, scglue, scanpy
forscmmib
python package. - R >=3 and
dplyr, scales, ggimage,ggplot2, cowplot
forplot_scmmib_table.r
R tool.
- Preparing the envrionment.
- Option 1: install dependencies by pip.
- for example, for python package, python dependencies can be install with pip:
# pip install scib scglue scanpy # install main dependencies to an existing environment.
pip install -r pip_requirement.txt # install all python dependencies with fixed version
- Option 2: use a new conda env of mixture dependencies (stable).
Th conda tool (miniconda) can be installed from anaconda website.
Then create and enter the conda environment.
conda env create -f scmmib_env.yml
conda activate scmmib
- Install scmmib package.
# download SCMMIB
git clone https://github.com/bm2-lab/SCMMI_Benchmark
# set dir to folder
cd SCMMI_benchmark
pip install .
- Test the installation in python
import scmmib
- A bug may occur for graph LISI metrics as follows:
FileNotFoundError, [Errno 2] No such file or directory: '/tmp/lisi_svo3el2i/graph_lisi_indices_0.txt'
The related GitHub issue in scib project is here and a posssible solution.
The plot_scmmib_table.r
is a simplified version of summary table visualization tool, which is adapted from both funkyheatmap
package and scib_knit_table
function in scib
package, as these two tools requires complex input format and numerous input format restrictions.
A demo output:
The plot_scmmib_table.r
can be used alone with input of simple R data.frame
format. All summary figures were generated with plot_scmmib_table.r
tool.
We provided a demo noteook and reference manual for using plot_scmmib_table.r
.
More examples can be referred in figure reproducibility code.
All datasets analyzed in SCMMIB study are listed below. Details of these datasets were introduced in our stage1 manuscript. The processed datasets are available in a public Figshare repostiory link.
Dataset name | Multi-omics | Batches | Species | Number of cells | sample/tissue type |
---|---|---|---|---|---|
BMMC Multiome | scRNA + scATAC | 12 donors from 4 sites | Human | 69,249 | bone marrow mononuclear cells |
BMMC CITE-seq | scRNA + ADT | 12 donors from 4 sites | Human | 90,261 | bone marrow mononuclear cells |
HSPC Multiome | scRNA + scATAC | 4 donors of 5 time points | Human | 105,942 | hematopoietic stem and progenitor cells |
HSPC CITE-seq | scRNA + ADT | 4 donors of 5 time points | Human | 70,988 | hematopoietic stem and progenitor cells |
SHARE-seq skin | scRNA + scATAC | - | Mouse | 34,774 | skin |
COVID19 CITE-seq | scRNA + ADT | 143 donors | Human | 781,123 | peripheral blood immune cells |
10X PBMC | scRNA + scATAC | 2 samples | Human | 15,021 | peripheral blood immune cells |
10X Mouse Brain | scRNA + scATAC | 2 replicates for 2 samples | Mouse | 12,138 | brain |
Human white blood cell | scRNA + ADT | 8 donors of 3 time points | Human | 161,764 | white blood cell |
10X NSCLC | scRNA + ADT | 2 replicates | Human | 15,618 | NSCLC |
10X kidney cancer | scRNA + ADT | 7 donor | Human | 20,974 | Kidney |
Lymph node spatial | spatial+scRNA+ADT | 2 samples | Human | 6,843 | lymph node |
Thymus spatial | spatial+scRNA+ADT | 4 samples | Mouse | 17,824 | thymus |
Spleen SPOTS | spatial+scRNA+ADT | 2 samples | Mouse | 5,336 | spleen |
All benchmark methods analyzed in SCMMIB study are listed below. Details of these methods were available in our Register Report Stage 1 manuscript in figshare folder.
Our stage1 manuscript "Benchmarking single-cell multi-modal data integrations." was public in Nature Methods register report figshare folder in links.
Our stage2 manuscript was submitted.
Fu, Shaliu; Wang, Shuguang; Si, Duanmiao; Li, Gaoyang; Gao, Yawei; Liu, Qi (2024). Benchmarking single-cell multi-modal data integrations. figshare. Journal contribution. https://doi.org/10.6084/m9.figshare.26789572.v1
SCMMIB project processed datasets. figshare. Dataset. https://doi.org/10.6084/m9.figshare.27161451.v2