CSI-Microbes-analysis

This repository contains part of the workflows for reproducing the results from the bioRxiv paper scRNA-seq analysis of colon and esophageal tumors uncovers abundant microbial reads in myeloid cells undergoing proinflammatory transcriptional alterations by Welles Robinson, Josh Stone, Fiorella Schischlik, Billel Gasmi, Michael Kelly, Charlie Seibert, Kimia Dadkhah, E. Michael Gertz, Joo Sang Lee, Kaiyuan Zhu, Lichun Ma, Xin Wang, S. Cenk Sahinalp, Rob Patro, Mark D.M. Leiserson, Curtis Harris, Alejandro A. Schäffer, and Eytan Ruppin. This repository contains the workflows to analyze microbial reads from 10x and Smart-seq2 scRNA-seq datasets to identify microbial taxa that are differentially abundant or differentially present. Prior to running this code, these microbial reads must be identified using the CSI-Microbes-identification repository. The code in this repository was written by Welles Robinson and Fio Schischlik and alpha-tested by Alejandro Schaffer.

Requirements

This workflow has been tested on Mac OS Mojave (10.14.6) and the Linux OS (biowulf). The minimum memory requirements are 10 GB for all steps except for figure 5A, which requires 30 GB of RAM. This workflow expects that conda has been installed. For instructions on how to install conda, see conda install documentation.

Software Installation

It should take < 30 minutes to install the software, which involves downloading the codebase and setting up the environment (not including the time needed unzip the files, which depends on the OS). There are two ways to download the codebase. To reproduce the key results from our paper, it is recommended to download the latest version of CSI-Microbes-analysis from Zenodo, which contains the intermediate files generated using CSI-Microbes-identification. The intermediate files for a given dataset are located in the <dataset_of_interest>/raw directory. For example, the intermediate files needed to reproduce Aulicino2018 are in Aulicino2018/raw).

The second way to download the codebase is to clone the GitHub repository as shown below (which does not contain the intermediate files). The below instructions assume that you have an ssh key associated with your GitHub account. If you do not, you can generate a new ssh key and associate it with your GitHub username by following these instructions.

git clone [email protected]:ruppinlab/CSI-Microbes-analysis.git

Once the codebase is downloaded, you need to create the conda environment (you need to perform this step only once unless you explicitly delete the conda environment).

cd CSI-Microbes-analysis
conda env create -f envs/CSI-Microbes-analysis.yaml

Finally, you need to activate the recently created conda environment (all of the commands assume that the conda environment CSI-Microbes-env is active).

conda activate CSI-Microbes-env

Software Dependencies

CSI-Microbes-analysis depends on the following software packages that are installed via the conda channels conda-forge, bioconda and defaults: dplyr (1.0.5)^REF, ggforce (0.3.3)^REF, ggplot2 (3.3.3)^REF, ggpubr (0.4.0)^REF, rpy2 (3.4.4)^REF, scater (1.16.0) ^REF, scran (1.16.0) ^REF, SingleCellExperiment (1.10.1)^REF, Snakemake (6.2.1)^REF, and Seurat (4.0.1)^REF.

Reproducing key results and figures from the paper

The reproduction of key results and figures from the paper requires intermediate files generated by CSI-Microbes-identification and available for download from Zenodo.

Reproducing results from Aulicino2018

To reproduce the results from Aulicino2018^REF, you first need to be in the Aulicino2018 directory.

cd Aulicino2018