exaR - APA target calling through exon crawling

Introduction

exaR is a robust computational approach to quantify alternative poly(A) site usage from traditional mRNA-seq datasets.

Usage

Run Snakemake pipeline:

bash run_snakemake.sh <working directory> <config>.yaml [<snakemake parameter>]

Checkout the Installation instructions

Input data

The following inputs are provided by a config file

PolyA database: a 6-column BED file of single nucleotide coordinates defining new 3' ends of transcript isoforms, that need to be integrated into the genome annotation.
Genome annotation: The is the entire genome annotation (gtf) containing the full annotation, including at least gene, transcript, exon and CDS information. From this the exon segments are derived and the PolyA sites from the PolyA database are integrated.
Sample sheet: exaR uses DEXseq for identifying differential 3'UTR segments. Here the samplesheet needs to contains columns 'name' and 'condition'. For 'condition' the labels ctrl and cond are required. Make sure that ctrl comes first.
Alignments: These files need to be in BAM format. Can be produces using snakePipes mRNA-seq workflow

Config file:

All fields are required:

# used as prefix
project_name: utr3_quantification
# directory with bam files
bam_dir: bam_files/
# directory with samplessheets, tsv format and suffix
samplesheets_dir: samplesheets/
# reference annotation
annotation: dm6_ensembl96.gtf
# DEXseq path:
DEXseq_path: <DEXseq installation path>/DEXseq
# PolyA database path
polya_database: <polyA database>.bed
# params for breakpoint filtering
min_distance: 100
padj_cutoff: 0.05

Output

utr3_quantification/
├── Annotation
│   ├── annotation.segments.gff
│   ├── Breakpoints_pooled.merged_downstream_breakpoint.gff
│   ├── Breakpoints_pooled.merged_intervals.gff
│   ├── log
│   │   ├── Breakpoints_pooled.log
│   │   ├── exon_segmentation.log
│   │   └── Segments_split.log
│   ├── Segments_split.breakpoints_selected.gff
│   ├── Segments_split.gff
│   ├── Segments_split.nodes_selected.gff
│   └── Segments_split.saf
├── APA_targets
│   ├── <sample comparison>.APA_targets.gff
│   ├── <sample comparison>.APA_targets.locus.gff
│   ├── <sample comparison>.APA_targets.tsv
│   ├── <sample comparison>.segments_split.dexseq.gff
│   └── log
│       └── <sample comparison>.APA_targets.log
├── config.yaml
├── DEXseq
│   ├── <sample comparison>.segments_split.dexseq.tsv
│   └── log
│       └── <sample comparison>.segments_split.dexseq.log
└── featureCount
    ├── log
    │   └── utr3_quantification.segments_split.featureCounts.log
    ├── utr3_quantification.segments_split.featureCounts.tsv
    └── utr3_quantification.segments_split.featureCounts.tsv.summary

DEXSeq quanitification of each node/segment: <sample comparison>.segments_split.dexseq.tsv
Differential APA table: <sample comparison>.APA_targets.tsv
Differential APA regions: <sample comparison>.APA_targets.locus.gff
Segments after modification integrating PolyA database: Segments_split.gff

<sample comparison> is the filename of the samplesheet (cropped tsv extension=

Installation

The installation through conda can take several hours and - especially the R packages - can be installed manually as well.

Manual setup (Recommended)

The manual setup consists of three steps

Setup conda environment and install libraries
Install R bioconductor packages
Setup DEXseq installation path

Setup conda environment and install libraries

If installing the following packages fails, bioconductor install method can deal with R/3.5.2 packages. Check for more: https://www.bioconductor.org/install/

conda create -n exaR  
conda activate exaR
conda install -c conda-forge r-readr r-base r-dplyr r-stringr r-tibble r-ggplot2 r-reshape2 r-pheatmap r-janitor r-optparse
conda install -c bioconda htseq snakemake subread

Install R bioconductor packages

Once the conda setup is done, you can manually install the following bioconductor packages:

Setup DEXseq installation path

Find path of dexseq_prepare_annotation.py

Extract libPaths from R

R -e '.libPaths()'

and replace <DEXseq installation path> config entry in

[...]
# DEXseq path:
DEXseq_path: <DEXseq installation path>/DEXseq
[...]

From conda.yaml

conda env create -f exaR.yaml

This conda environment config installs

Python + HTseq
R-base + related packages from CRAN, bioconductor *
- Tested with R/4.0.3
subread for featureCounts
snakemake
- Tested with python>=3.5

Contributors

Dr. Barbara Hummel

Michael Rauer

License

GNU GPL license (v3)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
tools		tools
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
example_files.tar.gz		example_files.tar.gz
run_snakemake.sh		run_snakemake.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exaR - APA target calling through exon crawling

Introduction

Usage

Input data

Config file:

Output

Installation

Manual setup (Recommended)

Setup conda environment and install libraries

Install R bioconductor packages

Setup DEXseq installation path

From conda.yaml

Contributors

License

About

Releases

Packages

Contributors 2

Languages

License

hilgers-lab/apa_target_caller

Folders and files

Latest commit

History

Repository files navigation

exaR - APA target calling through exon crawling

Introduction

Usage

Input data

Config file:

Output

Installation

Manual setup (Recommended)

Setup conda environment and install libraries

Install R bioconductor packages

Setup DEXseq installation path

From conda.yaml

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages