PREP User Manual

General Description

RNA editing is a source of transcriptomic diversity, mainly in non-coding regions, and is altered in cancer. Recent studies demonstrated that A-to-I RNA editing events are manifested at the proteomic level and contribute to protein heterogeneity in cancer. Given somatic RNA-editing mutation as input, PREP identify and evaluate the potential immunogenicity of RNA editing based peptides. Detailed information please refer to citation.

Dependencies

Hardware:

PREP currently tested on x86_64 on ubuntu 16.04.

Required software:

Python 2.7
NetMHCpan 4.0
Variant Effect Predictor (VEP)
BWA
STAR
samtools
Optitype
GATK 3.8
Picard tools
Java 8
kallisto
trimmomatic
vcftools
blast
tabix
gawk

Required Python package:

yaml
XGboost
biopython
scikit-learn==0.19.1
pandas
numpy
Pyomo
tables
pysam
future
multiprocessing
subprocess
math
matplotlib

Installation via Docker

Docker image of PREP is at https://hub.docker.com/r/bm2lab/prep/.

Install Docker on your computer and make sure it works.
Call docker pull bm2lab/prep which will download the Docker image.

Run the image in interactive mode with your dataset:

 docker run -it -v /your/path/to/dataset/:/home/bioworker/dataset bm2lab/prep /bin/bash

Change directory into /home/bioworker/project/PREP:
```
 cd /home/bioworker/project/PREP
```
Download reference data:
```
 bash data_download.sh
```
Edit config.yaml and fill the proper path of input files.
Run the program with follow commands:
```
 python PREP.py RE -i config.yaml
```

Installation from source

Install all software, python packages and R packages listed above, and make sure each software and package works in your system.
Install multiprocessing and other packages with the pip command:
```
 pip install -U multiprocessing
 ...
```
Download or clone the PREP repository to your local system:
```
 git clone https://github.com/bm2-lab/PREP.git
```

Reference data includes genome fasta, peptide(GRCh38 build) could be downloaded and processed through:

 bash data_download.sh

a few reference data would be in the fold database and processed by custom script in order to run the pipeline, including:

 [Fasta] 

 This fold contains the reference fasta file, its bwa index and some other files result from `huamn.fasta`:
 Homo_sapiens.GRCh38.dna.primary_assembly.fa
 Homo_sapiens.GRCh38.dna.primary_assembly.fa.amb	
 Homo_sapiens.GRCh38.dna.primary_assembly.fa.ann	
 etc...

 [Annotation file] 

 This fold contains the vcf file used to run RNAEditor:
 1000GenomeProject.vcf
 HAPMAP.vcf
 ESP.vcf
 dbSNP.vcf
 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

 [Protein] 

 This fold contains the reference cDNA and protein sequence of human:
 Homo_sapiens.GRCh38.pep.all.fa

Among the required software listed above, GATK, kallisto, picard, samtools, trimmomatic-0.36 were prepared in software directory, other software should be installed by user own due to complexity, please refer to the software links above.
Fill in the config.yaml file with your local path, make sure you have installed all above software and have downloaded reference data.You should be aware that the version of VEP library you use should match the references used (peptide and cDNA). E.g. in the example above used version/release 89 of GRCh38.

Usage

You can use these two modes by:

    python PREP.py RE -i config.yaml

Input Files

PREP accepts pair-end or single-end RNA sequencing as input. It could be in .fastq.gz or .fastq format. You should specify the right path to the sequencing file in config.yaml like:

#your path to first RNA-seq fastq file
tumor_rna_fastq_1: ~/ncbi/dbGaP-14145/sra/SRR2673065_1.fastq.gz
#your path to second RNA-seq fastq file
tumor_rna_fastq_2: ~/ncbi/dbGaP-14145/sra/SRR2673065_2.fastq.gz

Setting parameters

User should set all the parameters in the configuration file config.yaml . The configuration file contains three parts of parameters:

Input data parameters, including path of RNA sequencing data, output fold, run name.
Software excutable path of opitype, vep, netMHCpan.

Output Files

The output files are the following:

final_neoantigen.tsv

The file is a TSV file with the extracted mutated peptides derived from RNA editing with a quantitative score measures the immunity of neoepitopes.

Column explanation

The prediction output (final_neoantigen.tsv) for each peptide pair consists of the following columns:

Column Name	Description
Position	Mutation position in genome.
HLA_type	HLA allele name.
Gene	HUGO symbol name of mutatied gene.
Transcript_name	Ensembl transcript ID
Mutation	Necleotide change of mutated gene
AA_change	Amino acid change annotated in VEP file.
WT_pep	The extracted normal peptide.
WT_Binding_EL	%Rank of prediction score for nomal peptides use NetMHCpan4.0 (defalut model).
WT_Binding_Rank	%Rank of prediction score for nomal peptides use NetMHCpan4.0 (-ba model).
MT_pep	The extracted mutant peptide.
MT_Binding_EL	%Rank of prediction score for mutated peptides use NetMHCpan4.0(defalut model).
MT_Binding_Rank	%Rank of prediction score for mutant peptides use NetMHCpan4.0 (-ba model).
DriverGene_Lable	TRUE if the HUGO symbol is in the cosmic reference list, FALSE if it is not.
MT_Binding_level_des	Binding level description of mutated peptide.
WT_Binding_level_des	Binding level description of normal peptide.
Editing_ratio	RNA ediitng level of the mutation.
Neo_score	Immunogenicty score for RNA editing neoepitope.

Contact

1410782Chiz@tongji.edu.cn or qiliu@tongji.edu.cn

Biological and Medical Big data Mining Lab
Tongji University

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PREP_User_Manual.md

PREP_User_Manual.md

PREP User Manual

Table of Contents

General Description

Dependencies

Hardware:

Required software:

Required Python package:

Installation via Docker

Installation from source

Usage

Input Files

Input Files

Setting parameters

Output Files

Column explanation

Contact

Algorithmic Flow Chart

Files

PREP_User_Manual.md

Latest commit

History

PREP_User_Manual.md

File metadata and controls

PREP User Manual

Table of Contents

General Description

Dependencies

Hardware:

Required software:

Required Python package:

Installation via Docker

Installation from source

Usage

Input Files

Input Files

Setting parameters

Output Files

Column explanation

Contact

Algorithmic Flow Chart