Skip to content

Commit

Permalink
Merge pull request #1 from Princeton-LSI-ResearchComputing/update-readme
Browse files Browse the repository at this point in the history
Update README.md to include information on Quarto notebooks
  • Loading branch information
lparsons authored Apr 21, 2023
2 parents 938d481 + e7656c8 commit 6588a2f
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 94 deletions.
125 changes: 31 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# Snakemake workflow: smallRNA Pipeline

[![Snakemake](https://img.shields.io/badge/snakemake-≥6.6.1-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Build Status](https://travis-ci.org/troycomi/adamson_smRNA.svg?branch=master)](https://travis-ci.org/snakemake-workflows/adamson_smRNA)
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.9.0-brightgreen.svg)](https://snakemake.github.io/)

Workflow for analysis of smRNA
Workflow for the analysis of small RNA-seq data for Yan et. al, "An endogenous
small RNA-binding protein safeguards prime editing" (in press).

This is the template for a new Snakemake workflow. Replace this text with a
comprehensive description, covering the purpose and domain.
The workflow is written using [Snakemake](https://snakemake.github.io/) and
[Quarto](https://quarto.org/).

Insert your code into the respective folders, i.e. `scripts`, `rules` and
`envs`. Define the entry point of the workflow in the `Snakefile` and the main
configuration in the `config.yaml` file.
Dependencies are installed using [Bioconda](https://bioconda.github.io/) where
possible.

The workflow is written using [Snakemake](https://snakemake.readthedocs.io/).
The workflow consists of two pieces, one written in Snakemake, the other is
composed of Quarto notebooks.

Dependencies are installed using [Bioconda](https://bioconda.github.io/) where possible.
## Snakemake workflow

## Setup environment and run workflow
### Setup environment and run workflow

1. Clone workflow into working directory

Expand Down Expand Up @@ -47,98 +47,35 @@ Dependencies are installed using [Bioconda](https://bioconda.github.io/) where p
source activate <project>
```

6. Execute workflow
6. Execute main workflow

```bash
snakemake -n
snakemake --cores 1
```

7. Investigate results
## Quarto notebooks

After successful execution, you can create a self-contained interactive
HTML report with all results via:

```bash
snakemake --report report.html
```

This report can, e.g., be forwarded to your collaborators.

## Running workflow on `gen-comp1`

```bash
snakemake --cluster-config cluster_config.cetus.yaml \
--drmaa " \
--cpus-per-task={cluster.n} \
--mem={cluster.memory} \
--qos={cluster.qos}" \
--use-conda -w 60 -rp -j 1000
```

## Generating the R report

Currently, the R report has to be run separately.
The Quarto notebooks utilize [R](https://www.r-project.org/) and are run
separately.

1. Run the workflow as above

2. Load the Rproject `adamson-smallRNA.Rproj` in RStudio

3. Load `exogenous-rna-profiles.Rmd`, run all cells, and save to create the
`exogenous_targets_paired_read_summary.tsv` as well as the report
`exogenous-rna-profiles.nb.html`

### R packages

1. If needed, install the `renv` package

```r
install.packages("renv")
```

2. Load required packages

```r
renv::activate()
```

## Advanced

The following recipe provides established best practices for running and
extending this workflow in a reproducible way.

1. [Fork](https://help.github.com/en/articles/fork-a-repo) the repository to a
personal or lab account.

2. [Clone](https://help.github.com/en/articles/cloning-a-repository) the fork
to the desired working directory for the concrete project/run on your
machine.

3. [Create a new
branch](https://git-scm.com/docs/gittutorial#_managing_branches) (the
project-branch) within the clone and switch to it. The branch will contain
any project-specific modifications (e.g. to configuration, but also to
code).

4. Modify the config, and any necessary sheets (and probably the workflow) as
needed.

5. Commit any changes and push the project-branch to your fork on GitHub.

6. Run the analysis.

7. Optional: Merge back any valuable and generalizable changes to the [upstream
repository](https://github.com/troycomi/adamson_smRNA) via a [**pull
request**](https://help.github.com/en/articles/creating-a-pull-request).
This would be **greatly appreciated**.

8. Optional: Push results (plots/tables) to the remote branch on your fork.
2. Load the Rproject `pe-small-rna-seq-analysis.Rproj` in RStudio.

9. Optional: Create a self-contained workflow archive for publication along
with the paper (snakemake --archive).
3. This project uses
[`renv`](https://rstudio.github.io/renv/articles/renv.html) to keep track of
installed packages. Install `renv` if not installed and load dependencies
with `renv::restore()`.
10. Optional: Delete the local clone/workdir to free space.
4. Load one of the quarto notebooks below and notebook and run all of the cells
or use the "Render" button in RStuido.
## Testing
* `biotype-comparison.qmd`
* `fragment-size-distributions.qmd`
* `alignment_statistics.qmd`
* `coverage-plots.qmd`
Tests cases are in the subfolder `.test`. They should be executed via
continuous integration with Travis CI.
5. Some of the notebooks use parameters to generate a few different versions of
the plots. If Quarto and all of the required R packages are installed, you
can use the `render_quarto_reports.sh` script to render all of the quarto
notebooks.
File renamed without changes.

0 comments on commit 6588a2f

Please sign in to comment.