Skip to content

Latest commit

 

History

History
87 lines (60 loc) · 2.43 KB

README.md

File metadata and controls

87 lines (60 loc) · 2.43 KB

Snakemake workflow: smallRNA Pipeline

Snakemake DOI

Workflow for the analysis of small RNA-seq data for Yan et. al, "An endogenous small RNA-binding protein safeguards prime editing" (in press).

The workflow is written using Snakemake and Quarto.

Dependencies are installed using Bioconda where possible.

The workflow consists of two pieces, one written in Snakemake, the other is composed of Quarto notebooks.

Snakemake workflow

Setup environment and run workflow

  1. Clone workflow into working directory

    git clone <repository> <dir>
    cd <dir>
  2. Download input data (or skip and use demo-data)

    Copy the fastq files into data directory

  3. Edit the configuration as needed (not needed if using demo-data)

    # Edit location of fastq files
    nano config/units.yaml
    # Generally, these can remain unchanged 
    nano config/samples.yaml
    nano config/config.yaml
  4. Install dependencies into isolated environment

    conda env create -n <project> --file environment.yaml
  5. Activate environment

    source activate <project>
  6. Execute main workflow (using cluster options is recommended)

    snakemake --cores 1

Quarto notebooks

The Quarto notebooks utilize R and are run separately.

  1. Run the workflow as above

  2. Load the Rproject pe-small-rna-seq-analysis.Rproj in RStudio.

  3. This project uses renv to keep track of installed packages. Install renv if not installed and load dependencies with renv::restore().

  4. Load one of the quarto notebooks below and notebook and run all of the cells or use the "Render" button in RStuido.

    • biotype-comparison.qmd
    • fragment-size-distributions.qmd
    • alignment_statistics.qmd
    • coverage-plots.qmd
    • three-prime-quantification.qmd
  5. Some of the notebooks use parameters to generate a few different versions of the plots. If Quarto and all of the required R packages are installed, you can use the render_quarto_reports.sh script to render all of the quarto notebooks.