Skip to content

cancerbits/ressler2024_neobcc

Repository files navigation

Code repository for the analysis of single-cell RNA-seq, TCR- and BCR-Seq data presented in Ressler JM et al., 2024

Maud Plaschka and Florian Halbritter St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria

Repository structure

Set up the environment

project.Dockerfile defines the environment used to carry out all experiments

config.yaml is used to set paths

bash/ holds shell scripts to build and run the docker image, and to parse the config file

Scripts to run the analysis

run_ressler2024.R can be run to reproduce the figures of Ressler et al. (2024). It renders every notebook template in notebook_template/

notebook_template/ holds R markdown documents for the individual steps of the project, corresponding to each of the figure of the manuscript.

notebook/ holds .html and .md reports for the individual steps of the project, corresponding to each of the figure of the manuscript, generated with the corresponding notebook_template

Data and metadata

metadata/ holds custom geneset definitions required for the analysis. We recommand to clone the github repository to avoid any path confusion. Alternatively, path can be changed at the beginning of run_ressler2024.R.

data should hold the Seurat objects containing pre-processed

  • single cells

  • TCR-Seq

  • BCR-Seq data.

Please note, that these files need to be requested and/or downloaded before running the analysis, see instructions bellow.

Reproducing the results

Paths in the config.yaml file starting with "/path/to/" will have to be set, as well as paths in run_ressler2024.R to reproduce the figures of the manuscript.

To achieve high reproducibility, we suggest starting with the pre-processed Seurat object available on GEO. We recommend cloning this github repository and saving the Seurat object in the data subdirectory directory.

If you like to start from the raw data, you can request access here EGA. Please note that before being able to run run_ressler2024.R, you will have to go through few pre-processing steps:

Build the references for transfer of annotations

Tabula Sapiens Skin reference:

Run in INIT_TS_Skin_reference.Rmd

Website

Publication

human PBMC Azimuth reference:

Run in bash/get_ref_human_pbmc.sh

Website

Publication

The INIT file allow you to pre-process and annotated the raw single cell data. The pre-processing includes:

  • Ambiant RNA correction using SoupX per samples, raw counts are corrected using SoupX with a manual list of specific genes.

  • Seurat workflow: corrected counts are merge in a single Seurat object, filtered and normalized (SCTransform).

  • Clustering: After normalization, Ig genes are removed from the variable genes to avoid multiple B and plasma cells clusters driven by random varable chains.

  • Reference mapping: single cell data are mapped to the Tabula Sapiens Skin reference and Azimuth PBMC reference using azimuth reference mapping.

This result to the construction of the fully annotated Seurat object which is the input of every other Rmd file.

BCR- and TCR-Seq data

Please note that due to confidentiality and possible patient identification with genomic data, BCR- and TCR-Seq data are only available upon request on EGA.

As default, plots requiring BCR- or TCR-Seq data won't be run. Code is however available in the reports. You can change the non-run default changing the accessibility parameter to unlock on top of run_ressler2024.R script.

##########################################################################################
accessibility <- "unlock"
##########################################################################################

Links

Paper:

Data files:

Raw data files are available at The European Genome-phenome Archive EGA

Counts are available in the GEO platform

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages