Workflow for the analysis of small RNA-seq data for Yan et. al, "An endogenous small RNA-binding protein safeguards prime editing" (in press).
The workflow is written using Snakemake and Quarto.
Dependencies are installed using Bioconda where possible.
The workflow consists of two pieces, one written in Snakemake, the other is composed of Quarto notebooks.
-
Clone workflow into working directory
git clone <repository> <dir> cd <dir>
-
Download input data (or skip and use demo-data)
Copy the fastq files into
data
directory -
Edit the configuration as needed (not needed if using demo-data)
# Edit location of fastq files nano config/units.yaml # Generally, these can remain unchanged nano config/samples.yaml nano config/config.yaml
-
Install dependencies into isolated environment
conda env create -n <project> --file environment.yaml
-
Activate environment
source activate <project>
-
Execute main workflow (using cluster options is recommended)
snakemake --cores 1
The Quarto notebooks utilize R and are run separately.
-
Run the workflow as above
-
Load the Rproject
pe-small-rna-seq-analysis.Rproj
in RStudio. -
This project uses
renv
to keep track of installed packages. Installrenv
if not installed and load dependencies withrenv::restore()
. -
Load one of the quarto notebooks below and notebook and run all of the cells or use the "Render" button in RStuido.
biotype-comparison.qmd
fragment-size-distributions.qmd
alignment_statistics.qmd
coverage-plots.qmd
three-prime-quantification.qmd
-
Some of the notebooks use parameters to generate a few different versions of the plots. If Quarto and all of the required R packages are installed, you can use the
render_quarto_reports.sh
script to render all of the quarto notebooks.