Code repository for the analysis of single-cell RNA-seq, TCR- and BCR-Seq data presented in Ressler JM et al., 2024
Maud Plaschka and Florian Halbritter St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
• project.Dockerfile defines the environment used to carry out all experiments
• config.yaml is used to set paths
• bash/ holds shell scripts to build and run the docker image, and to parse the config file
• run_ressler2024.R
can be run to reproduce the figures of Ressler et al. (2024). It renders every notebook template in notebook_template/
• notebook_template/ holds R markdown documents for the individual steps of the project, corresponding to each of the figure of the manuscript.
• notebook/ holds .html
and .md
reports for the individual steps of the project, corresponding to each of the figure of the manuscript, generated with the corresponding notebook_template
• metadata/ holds custom geneset definitions required for the analysis. We recommand to clone the github repository to avoid any path confusion. Alternatively, path can be changed at the beginning of run_ressler2024.R
.
• data should hold the Seurat
objects containing pre-processed
-
single cells
-
TCR-Seq
-
BCR-Seq data.
Please note, that these files need to be requested and/or downloaded before running the analysis, see instructions bellow.
Paths in the config.yaml file starting with "/path/to/" will have to be set, as well as paths in run_ressler2024.R
to reproduce the figures of the manuscript.
To achieve high reproducibility, we suggest starting with the pre-processed Seurat
object available on GEO.
We recommend cloning this github repository and saving the Seurat
object in the data
subdirectory directory.
If you like to start from the raw data, you can request access here EGA.
Please note that before being able to run run_ressler2024.R
, you will have to go through few pre-processing steps:
Run in INIT_TS_Skin_reference.Rmd
Run in bash/get_ref_human_pbmc.sh
The INIT file allow you to pre-process and annotated the raw single cell data. The pre-processing includes:
-
Ambiant RNA correction using
SoupX
per samples, raw counts are corrected using SoupX with a manual list of specific genes. -
Seurat
workflow: corrected counts are merge in a single Seurat object, filtered and normalized (SCTransform
). -
Clustering: After normalization, Ig genes are removed from the variable genes to avoid multiple B and plasma cells clusters driven by random varable chains.
-
Reference mapping: single cell data are mapped to the Tabula Sapiens Skin reference and Azimuth PBMC reference using
azimuth
reference mapping.
This result to the construction of the fully annotated Seurat object which is the input of every other Rmd file.
Please note that due to confidentiality and possible patient identification with genomic data, BCR- and TCR-Seq data are only available upon request on EGA.
As default, plots requiring BCR- or TCR-Seq data won't be run.
Code is however available in the reports.
You can change the non-run default changing the accessibility
parameter to unlock
on top of run_ressler2024.R
script.
##########################################################################################
accessibility <- "unlock"
##########################################################################################
Raw data files are available at The European Genome-phenome Archive EGA
Counts are available in the GEO platform