Oncofuse is a framework designed to estimate the oncogenic potential of de-novo discovered gene fusions. It uses several hallmark features and employs a bayesian classifier to provide the probability of a given gene fusion being a driver mutation.
Please cite the following paper if you are using Oncofuse:
Mikhail Shugay, Inigo Ortiz de Mendibil, Jose L. Vizmanos and Francisco J. Novo. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics. 16 Aug 2013. doi:10.1093/bioinformatics/btt445.
This repository contains the Oncofuse source code and the latest binary version of the framework in releases section. Please see this page for details on running the pipeline. You may be also interested in checking README.txt, examples folder and running java -jar Oncofuse.jar -h
.
-
Disclaimer: Oncofuse solely performs fusions annotation and oncogenic potential prediction under the assumption that a given fusion exists (i.e. is verifiable by PCR). It is the goal of experimental setup and fusion detection software to filter out those fusions that do not physically exist.
- Oncofuse reports, but does not account for fusion frame when calculating P-values. Is is done intentionally, as the fusion information often could be incomplete, e.g. there are cases when random nucleotides are added in fusion junction restoring the frame. So it is left up to the user to decide should he ignore out-of-frame fusions or not.
-
Note that since v1.0.9 there is an
-a hgXX
option, specifying genome assembly. It defaults tohg19
, yet certain tools, for example FusionCatcher v0.99.3e provide output inhg38
coordinates. -
Also note that support for FusionCatcher versions earlier than 0.99.3 was deprecated
-
Please use the issue tracker to report bugs and suggest new features.
- To obtain and compile the latest version of legacy Oncofuse package execute:
git clone https://github.com/mikessh/oncofuse --branch legacy
cd oncofuse
mvm clean install
- Then run as
cd target/
java -jar oncofuse-v1.X.X.jar [args]
If you need to copy oncofuse to other folder make sure libs and common folders are placed in the same directory, or just use symlink
Oncofuse is a framework designed to estimate the oncogenic potential of de-novo discovered gene fusions. It uses several hallmark features and employs a bayesian classifier to provide the probability of a given gene fusion being a driver mutation.
Oncofuse is described in the following paper: Mikhail Shugay, Inigo Ortiz de Mendibil, Jose L. Vizmanos and Francisco J. Novo. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics. 16 Aug 2013. doi:10.1093/bioinformatics/btt445.
See http://www.unav.es/genetica/oncofuse.html for additional details.
-p
option specifies the number of threads Oncofuse will use
-a
option specifies genome assembly version. Allowed values: hg18, hg19 and hg38. Default value: hg19
This tool is designed to predict the oncogenic potential of fusion genes found by Next-Generation Sequencing in cancer cells. It also provides information on hallmarks of driver gene fusions, such as expression gain of resulting fusion gene, retained protein interaction interfaces and resulting protein domain functional profile.
Pre-requisites: Java(TM) SE Runtime Environment (build 1.7.0 and higher)
Running:
$ java -Xmx1G -jar Oncofuse.jar input_file input_type tissue_type output_file
Supported tissue types (tissue of origin for gene fusion): EPI (epithelial), HEM (hematopoietic), MES (mesenchymal), AVG (averaged, when tissue of origin is unknown)
Supported input types:
input_type = "coord" Default format accepted by Oncofuse Tab-delimited file with lines containing 5' and 3' breakpoint positions (first nucleotide lost upon fusion) and tissue of origin:
5' chrom | 5' coord | 3' chrom | 3' coord | tissue_type |
---|
| | | |
For this format tissue of origin is set individually for each entry in input file and tissue_type argument should be set as "-" Note that there are optional additional columns:
- 5' fusion partner gene (FPG) strand
- 3' FPG strand
- Number of spanning reads (reads that include junction bases)
- Number of encompassing reads (reads that encompass junction, but the junction itself is in the insert region)
input_type = "tophat" Default output file of Tophat-fusion and Tophat2 (usually fusions.out file in results folder). Data is pre-filtered based on number of spanning N>=1 and total number of supporting reads M>=2 reads. These parameters could be changed with extended input type argument "tophat-N-M". Tissue type has to be set using tissue_type argument. Tophat-fusion-post is also supported with extended input type argument "tophat-post".
input_type = "fcatcher" Default output file of FusionCatcher software. Tissue type has to be set using tissue_type argument.
input_type = "rnastar" Default output file of RNASTAR software. Data is pre-filtered based on number of spanning N>=1 and total number of supporting reads M>=2 reads. These parameters could be changed with extended input type argument "rnastar-N-M". Tissue type has to be set using tissue_type argument.
A tab-delimited table with the following columns
column name | description |
---|---|
SAMPLE_ID | The ID of sample for tophat-post, input file name otherwise |
FUSION_ID | The original line number in input file |
TISSUE | As specified by library argument or in 'coord' input file |
GENOMIC | Chromosomal coordinates for both breakpoints (as in input file) |
SPANNING_READS | Number of reads that cover fusion junction |
ENCOMPASSING_READS | Number of reads that map discordantly with one mate mapping to 5'FPG (fusion partner gene) and other mapping to 3'FPG |
5_FPG_GENE_NAME | The HGNC symbol of 5' fusion partner gene |
5_IN_CDS? | Indicates whether breakpoint is within the CDS of this gene |
5_SEGMENT_TYPE | Indicates whether breakpoint is located within either exon or intron |
5_SEGMENT_ID | Indicates number of exon or intron where breakpoint is located |
5_COORD_IN_SEGMENT | Indicates coordinates for breakpoint within that exon/intron |
5_FULL_AA | Length of translated 5' fusion partner gene (FPG) in full amino acids |
5_FRAME | Frame of translated 5' FPG |
(Same as 7 lines above for the 3' fusion partner gene) | ... |
FPG_FRAME_DIFFERENCE | The resulting frame of fusion gene, if equals to 0 then the fusion is in-frame |
P_VAL_CORR | he Bayesian probability of fusion being a passenger (class 0), given as Bonferroni-corrected P-value |
DRIVER_PROB | The Bayesian probability of fusion being a driver (class 1) |
EXPRESSION_GAIN | Expression gain of fusion calculated as [(expression of 5' gene)/(expression of 3' gene)]-1 |
5_DOMAINS_RETAINED | List of protein domains retained in 5' fusion partner gene |
3_DOMAINS_RETAINED | List of protein domains retained in 3' fusion partner gene |
5_DOMAINS_BROKEN | List of protein domains that overlap breakpoint in 5' fusion partner gene |
3_DOMAINS_BROKEN | List of protein domains that overlap breakpoint in 3' fusion partner gene |
5_PII_RETAINED | List of protein interaction interfaces retained in 5' fusion partner gene |
3_PII_RETAINED | List of protein interaction interfaces retained in 3' fusion partner gene |
CTF, G, H, K, P and TF | Corresponding functional family association scores (FFAS, see paper for details). Values are log-transformed and scaled to the largest score obtained from classifier training set. |