STtools is a software package that is designed to process spatial transciriptomics (ST) data from various platforms including Seq-Scope, SlideSeq, and VISIUM. The STTools pipeline includes preprocessing of raw sequence reads, alignment, collapsing barcodes into grids, clustering cell types, and high-resolution analysis with sliding window strategy. STTools leverages many existing software tools for single-cell and spatial transcriptomic analysis, such as STARSolo, Seurat, BayesSpace, and Seqtk.
We recommend running STTools in a linux operating system (e.g. Ubuntu 18.04). See Installation for required software tools to run STTools.
## clone the repository
git clone https://github.com/seqscope/STtools.git
cd STtools
## install required python packages
python -m pip install -r requirements.txt
## download example data and decompress
gdown https://drive.google.com/uc?id=1e0u57Yu_fVKFvs-UA7WYfj-vgm8Nd2y4
unzip STtools_example_data.zip
## create output directory and set environment variables
mkdir out
export STHOME=$(pwd)
export STDATA=$STHOME/STtools_example_data ## directory containing data
export STOUT=$STHOME/out ## output directory
export SEQTKPATH=/path/to/seqtk/bin ## path that contains seqtk binary
export STARPATH=/path/to/STAR/bin ## path that contains STAR binary
export GENOMEINDEX=/path/to/STAR/index ## path that contains STAR index
## UNCOMMENT if you need to build STAR index yourself for the example data,
## mkdir -p $STHOME/STtools_example_data/geneIndex/STARIndex
## $STARPATH/STAR --runThreadN 6 --runMode genomeGenerate --genomeDir $STHOME/STtools_example_data/geneIndex/STARIndex \
## --genomeFastaFiles $STHOME/STtools_example_data/geneIndex/mm10.fasta \
## --sjdbGTFfile $STHOME/STtools_example_data/geneIndex/mm10.gtf --sjdbOverhang 99
## export GENOMEINDEX=$STDATA/geneIndex/STARIndex/
## Run STTools - step A1 to V2
python3 $STHOME/sttools.py --run-all --STtools $STHOME \
--first-fq $STDATA/stepA_extractCoordinates/liver-MiSeq-tile2106-sub-R1.fastq.gz \
--second-fq1 $STDATA/stepA_align/liver_tile2106_sub_R1.fastq.gz \
--second-fq2 $STDATA/stepA_align/liver_tile2106_sub_R2.fastq.gz \
--outdir $STOUT --genome $GENOMEINDEX --star-path $STARPATH --seqtk-path $SEQTKPATH \
--seqscope1st 'HiSeq' --clustering False --lane-tiles 1_2106 \
--binsize 300 --window 150 -l 20 -o 'Sample' -c 2
STtools package have flexible options for the user to run all steps, specificn steps, or consecutive steps. Several examples from various scenarios are given below for illustratrion.
This image below illustrates the overall workflow for STtools.
There are 8 steps in total. Each step takes input from either the raw data or outputs of the previous steps. Please see a brief explanation on each step:
- Step A1 takes
fastq.gz
files as input and output spatial coordinates.txt
files and whitelist used forSTARsolo
alignemnt in the current working directory. - Step A2 takes barcode info, and spatial coordinates file to generate a barcode/HDMI density plot which can be compared with HE images for an estimation of tissue boundary
- Step A3 takes valid barcodes
whitelist.txt
, 2nd-seqfastq.gz
files, and the STAR indices of reference genome as input to runSTARsolo
alignment; this step outputs digital expression matrix (DGE). - Step C1 takes DGE from Step A3 and output
Seurat
object with collapsed DGE of simple square grids. - Step C2 takes DGE from Step A3 and output Seurat object with collapsed DGE of square grids from sliding window strategy
- Step C3 takes in RDS file from Step C1 and Step C2 as input and performs dimension reduction, clustering and conducts refernece mapping with simple square grids as query
- Step V1 takes DGE (Velocyto-format) from Step A3 and generate subcellular plots showing pattern of spliced/unspliced reads
- Step V2 takes DGE (Velocyto-format) from Step C3 and generate UMAP, Spatial clustering, and feature violin plots.
Linux operatin system is necessary to run STtools package. You also need to install the following software tools and librares/modules before using this package.
- STAR>=2.7.5c (Click for instructions to install STAR)
- seqtk (Click for instructions to install seqtk)
- R>=4.0.0 (STtools will install packages automatically if not installed. Please refer to the list of packages)
- Python>= 3.0 (STtools will install modules automatically if not installed, refer to the list of modules)
- perl(Click for instructions for installing perl )
- pigz(Click for instructions for installing pigz)
To install STtools, please run:
git clone https://github.com/seqscope/STtools.git
- SeqScope exmaple data for each step can be found at example data 1, please download the zip files. For each step, the example input data is stored in the corresponding subdirectories.
- VISIUM digital expresstion data and spatial coordinates are available at example data 2
- SlideSeq digital expression data and spatial coordinates are avaialbel at example data 3
Please refer to data formats for an illustration of required input data format for each step.
Here are some useful external links:
- To generate gene index for STARsolo alignment: https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html
- Multimodal reference mapping: https://satijalab.org/seurat/articles/multimodal_reference_mapping.html
- Incoporate transgenes to alignment: Please modify the gtf and fasta files according to https://github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md before generating genome index in STAR.