Workflow for the analysis of mRNA-seq data for Yan et. al, "An endogenous small RNA-binding protein safeguards prime editing" (in press).
The workflow is written using Snakemake and Quarto.
Dependencies are installed using Bioconda where possible.
The workflow consists of two pieces, one written in Snakemake, the other is composed of Quarto notebooks.
Here, we create two workflows to work with the two subsets of data separately:
HEK3_1TtoA
and PRNP_6GtoT
. Run the workflow in each directory separately.
The workflows use the publicly available rna-seq-star-deseq2 workflow. Citation: https://doi.org/10.5281/zenodo.4737358
-
Clone workflow into working directory
git clone <repository> <dir> cd <dir>
-
Download input data (or skip and use demo-data)
Copy the fastq files into
data
directory -
Edit the configuration as needed (not needed if using demo-data)
# Edit location of fastq files nano HEK3_1TtoA/config/units.yaml nano PRNP_6GtoT/config/units.yaml # Generally, these can remain unchanged nano HEK3_1TtoA/config/samples.yaml nano PRNP_6GtoT/config/samples.yaml nano HEK3_1TtoA/config/config.yaml nano PRNP_6GtoT/config/config.yaml
-
Install Snakemake and Snakedeploy
mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy mamba activate snakemake
-
Setup workflow specific resources
- Modify
workflow/profiles/default/config.yaml
to ensure rules have the required resources to run on the cluster.
- Modify
-
Run the workflow (using cluster options is recommended)
snakemake --use-conda -cores 1
-
Generate a report
snakemake --report report.zip
The Quarto notebooks utilize R and are run separately.
-
Run the workflows as above
-
Load the Rproject
./pe-mrna-seq-diffexp.Rproj
in RStudio. -
This project uses
renv
to keep track of installed packages. Installrenv
if not installed and load dependencies withrenv::restore()
. -
Load the quarto notebook
./mrna-seq-venn-diag.qmd
and run all of the cells or use the "Render" button in RStuido.