exaR is a robust computational approach to quantify alternative poly(A) site usage from traditional mRNA-seq datasets.
Run Snakemake pipeline:
bash run_snakemake.sh <working directory> <config>.yaml [<snakemake parameter>]
Checkout the Installation instructions
The following inputs are provided by a config file
- PolyA database: a 6-column BED file of single nucleotide coordinates defining new 3' ends of transcript isoforms, that need to be integrated into the genome annotation.
- Genome annotation: The is the entire genome annotation (gtf) containing the full annotation, including at least gene, transcript, exon and CDS information. From this the exon segments are derived and the PolyA sites from the PolyA database are integrated.
- Sample sheet: exaR uses DEXseq for identifying differential 3'UTR segments. Here the samplesheet needs to contains columns 'name' and 'condition'. For 'condition' the labels
ctrl
andcond
are required. Make sure thatctrl
comes first. - Alignments: These files need to be in BAM format. Can be produces using snakePipes mRNA-seq workflow
All fields are required:
# used as prefix
project_name: utr3_quantification
# directory with bam files
bam_dir: bam_files/
# directory with samplessheets, tsv format and suffix
samplesheets_dir: samplesheets/
# reference annotation
annotation: dm6_ensembl96.gtf
# DEXseq path:
DEXseq_path: <DEXseq installation path>/DEXseq
# PolyA database path
polya_database: <polyA database>.bed
# params for breakpoint filtering
min_distance: 100
padj_cutoff: 0.05
utr3_quantification/
├── Annotation
│ ├── annotation.segments.gff
│ ├── Breakpoints_pooled.merged_downstream_breakpoint.gff
│ ├── Breakpoints_pooled.merged_intervals.gff
│ ├── log
│ │ ├── Breakpoints_pooled.log
│ │ ├── exon_segmentation.log
│ │ └── Segments_split.log
│ ├── Segments_split.breakpoints_selected.gff
│ ├── Segments_split.gff
│ ├── Segments_split.nodes_selected.gff
│ └── Segments_split.saf
├── APA_targets
│ ├── <sample comparison>.APA_targets.gff
│ ├── <sample comparison>.APA_targets.locus.gff
│ ├── <sample comparison>.APA_targets.tsv
│ ├── <sample comparison>.segments_split.dexseq.gff
│ └── log
│ └── <sample comparison>.APA_targets.log
├── config.yaml
├── DEXseq
│ ├── <sample comparison>.segments_split.dexseq.tsv
│ └── log
│ └── <sample comparison>.segments_split.dexseq.log
└── featureCount
├── log
│ └── utr3_quantification.segments_split.featureCounts.log
├── utr3_quantification.segments_split.featureCounts.tsv
└── utr3_quantification.segments_split.featureCounts.tsv.summary
- DEXSeq quanitification of each node/segment:
<sample comparison>.segments_split.dexseq.tsv
- Differential APA table:
<sample comparison>.APA_targets.tsv
- Differential APA regions:
<sample comparison>.APA_targets.locus.gff
- Segments after modification integrating PolyA database:
Segments_split.gff
<sample comparison>
is the filename of the samplesheet (cropped tsv extension=
The installation through conda can take several hours and - especially the R packages - can be installed manually as well.
The manual setup consists of three steps
- Setup conda environment and install libraries
- Install R bioconductor packages
- Setup DEXseq installation path
If installing the following packages fails, bioconductor install method can deal with R/3.5.2 packages. Check for more: https://www.bioconductor.org/install/
conda create -n exaR
conda activate exaR
conda install -c conda-forge r-readr r-base r-dplyr r-stringr r-tibble r-ggplot2 r-reshape2 r-pheatmap r-janitor r-optparse
conda install -c bioconda htseq snakemake subread
Once the conda setup is done, you can manually install the following bioconductor packages:
Find path of dexseq_prepare_annotation.py
Extract libPaths from R
R -e '.libPaths()'
and replace <DEXseq installation path>
config entry in
[...]
# DEXseq path:
DEXseq_path: <DEXseq installation path>/DEXseq
[...]
conda env create -f exaR.yaml
This conda environment config installs
- Python + HTseq
- R-base + related packages from CRAN, bioconductor *
- Tested with R/4.0.3
- subread for featureCounts
- snakemake
- Tested with python>=3.5
Dr. Barbara Hummel
Michael Rauer
GNU GPL license (v3)