Spatial Proteomics Simulation

This repository contains the code to simulate properties of spatial proteomics such as resolution and throughput in spatial transcriptomics datasets created using 10X Genomics' Visium.

The Visium data should be organized by tissue: the files for each tissue sample should live inside their own directory. A path to the directory containing all tissue directories should be provided to 1_process_raw_h5_files.py. If pathologist annotations are also available (as is the case in the publication by Arora et al. (2023)), the pipeline should begin with 0_process_annotations.R. Otherwise each the pipeline should be run in consecutive order, beginning with 1_process_raw_h5_files.py. This will create modified .h5 expression files which simulate the characteristics of spatial proteomics. They can then be run through traditional transcriptomics pipelines to determind whether spatial proteomics is currently capable of achieving the same levels of biological success as spatial proteomics.

Simulating Different Resolutions

The script 2_simulate_resolution.py creates versions of the gene expression matrix that collapse n spots together to simulate the larger voxel sizes common in spatial proteomics using laser capture microdissection. This script produces matrices collapsed by 3, 4, and 9 spots. In order to simulate the depth of spatial proteomics data found in recent publications (Zhu et al., 2023), the script drops expression counts found only in 2 or fewer of the collapsed spots.

The following figure visualizes the Visium data collapsed by 4 spots. If pathologist annotations are provided, the dots will be colored by annotation.

The following two figures compare simulated proteome depth before and after dropping expression counts found in 2 or fewer spots.

Prerequisites

You should have Conda installed on your system. If you haven't installed Conda, you can download it from https://docs.conda.io/en/latest/miniconda.html.

If you are working with a code repository, make sure to clone it to your local machine:

git clone https://github.com/zacheliason/spatial_proteomics_simulation.git
cd scp

Environment Setup

Navigate to the directory containing the YAML file for the Conda environment (scp_envv.yml).
Open your command prompt or terminal and execute the following command to create and activate the Conda environment from the YAML file:

conda env create -f scp_env.yml
conda activate scp

Running the Pipeline

Each script needs to be configured with the correct root directory. Then each can be run in consecutive order:

python3 1_process_raw_h5_files.py
python3 2_simulate_resolution.py
python3 3_simulate_sampling.py
python3 4_return_to_h5.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Seurat_Data		Seurat_Data
public		public
replication		replication
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
0_get_pathologist_annotations.R		0_get_pathologist_annotations.R
1_process_raw_h5_files.py		1_process_raw_h5_files.py
2_simulate_resolution.py		2_simulate_resolution.py
3_simulate_sampling.py		3_simulate_sampling.py
4_return_to_h5.py		4_return_to_h5.py
README.md		README.md
organize_tissue_files.py		organize_tissue_files.py
scp_env.yml		scp_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Proteomics Simulation

Simulating Different Resolutions

Prerequisites

Environment Setup

Running the Pipeline

About

Releases

Packages

Languages

zacheliason/spatial_proteomics_simulation

Folders and files

Latest commit

History

Repository files navigation

Spatial Proteomics Simulation

Simulating Different Resolutions

Prerequisites

Environment Setup

Running the Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages