-
Notifications
You must be signed in to change notification settings - Fork 1
Home
TrinityFusion leverages chimeric and unmapped reads to assemble fusion transcripts and transcripts of likely foreign origin (microbes and viruses), as a way of facilitating analysis of cancer transcriptomes.
TrinityFusion performs de novo transcriptome assembly from RNA-seq data using Trinity, and uses GMAP to identify candidate fusion transcripts. Bowtie2 is finally used to capture the reads that support the fusion, and fusion candidates are filtered according to evidence support and characteristics of the fusion gene partners. An overview of the process is illustrated below:
TrinityFusion has three execution modes:
-
TrinityFusion-C uses only chimeric reads identified by the STAR aligner for de novo assembly and subsequent fusion detection.
-
TrinityFusion-UC uses both the chimeric reads and reads that do not map to the genome as per the STAR aligner for de novo reconstruction followed by fusion detection.
-
TrinityFusion-D uses all input reads for de novo assembly followed by fusion detection.
TrinityFusion-UC has been found to be most generally useful for both fusion detection and exploring the assembled unmapped reads for potential transcripts of foreign origin, such as tumor viruses and microbes. Note, TrinityFusion-D is included for the sake of completeness, but TrinityFusion-C and TrinityFusion-UC were found far more impactful and in most cases these alternative modes should be used.
TrinityFusion can be downloaded from the TrinityFusion Releases site. Simply unpack the code and it's ready to use (no compilation necessary). TrinityFusion does have several software dependencies, however, such as Trinity (see below). It's easiest to use our Docker or Singularity images to hit the ground running.
TrinityFusion is part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT), and as such, also leverages the CTAT Genome Library (as also used by STAR-Fusion and FusionInspector. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz). If you need to build a genome lib from the provided source data, see our companion CTAT Genome Lib Builder software and documentation.
If you already have a fully functioning version of STAR-Fusion installed, then you do not need to install any additional data resources. You're almost ready to hit the ground running with TrinityFusion.
Before running TrinityFusion, you must configure your CTAT Genome Lib for use with minimap2. You can index the ref genome and prep resources like so:
% TrinityFusion/CTAT-LR-fusion/ctat-LR-fusion --prep_reference_only -T ladeda --genome_lib_dir /path/to/ctat_genome_lib_build_dir
If you can use our Docker or Singularity images, they come with everything needed. Otherwise, please ensure the following software are installed for use with TrinityFusion:
- see [Dockerfile](Dockerfile for the full software stack, including Trinity and STAR and versioning info corresponding to the current release.
Before running TrinityFusion, you'll first need to run STAR in order to define the chimeric and unmapped reads:
STAR --genomeDir ${star_index_dir} \
--readFilesIn ${left_fq_filename} ${right_fq_filename} \
--twopassMode Basic \
--outReadsUnmapped None \
--chimSegmentMin 12 \
--chimJunctionOverhangMin 12 \
--alignSJDBoverhangMin 10 \
--alignMatesGapMax 100000 \
--alignIntronMax 100000 \
--chimSegmentReadGapMax 3 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--runThreadN ${THREAD_COUNT} \
--outSAMstrandField intronMotif \
--outSAMunmapped Within \
--outSAMtype BAM Unsorted \
--outSAMattrRGline ID:GRPundef \
--chimMultimapScoreRange 10 \
--chimMultimapNmax 10 \
--chimNonchimScoreDropMin 10 \
--peOverlapNbasesMin 12 \
--peOverlapMMp 0.1 \
--chimOutJunctionFormat 1 # required as of STAR v2.6.1
After running STAR, you'll have access to the STAR 'Chimeric.out.junction' and 'Alignment.bam' files. These can be used as input to TrinityFusion.
TrinityFusion usage is shown below:
% ./TrinityFusion
################################################################
#
# Required:
#
# --left_fq <string> reads_1.fq
#
# --right_fq <string> reads_2.fq
#
# (If just given the reads, runs Trinity de novo assembly first on all reads)
#
# --output_dir STR_OUT_DIR output directory
#
# Alternative TrinityFusion modes, using STAR outputs
#
# --chimeric_junctions <string> STAR Chimeric.out.junction file
#
# (if given the chimeric junctions file, restricts to the chimeric junction reads alone)
#
# --aligned_bam <string> STAR aligned bam file
#
# (if given the aligned_bam & the chimeric junctions), assembles the unmapped and chimeric reads, not all reads).
#
#
# Optional:
#
# --genome_lib_dir <string> directory for CTAT genome lib (or use env var $CTAT_GENOME_LIB
# current setting: (/Users/bhaas/DB/CTAT_GENOME_LIB/GRCh37_v19_CTAT_lib_Feb092018/ctat_genome_lib_build_dir)
# --CPU <int> :number threads (default 4)
#
# --show_full_usage_info flag, shows all options available.
#
# --version show TrinityFusion version info: 0.3.0
#
################################################################
An example TrinityFusion command leveraging both the genome aligned bam file and the chimeric junctions file (TrinityFusion-UC mode of execution) would be:
% TrinityFusion --left_fq reads_1.fq --right_fq reads_2.fq \
--chimeric_junctions Chimeric.out.junction \
--aligned_bam Aligned.bam \
--genome_lib_dir /path/to/ctat_genome_lib_build_dir/
For TrinityFusion-C mode, do not provide the Aligned.bam file. For TrinityFusion-D mode, provide just the fastq files as input.
TrinityFusion will generate a tab-delimited output file: TrinityFusion-*.fusion_predictions.tsv formatted like so:
#FusionName num_LR LeftGene LeftLocalBreakpoint LeftBreakpoint RightGene RightLocalBreakpoint RightBreakpoint SpliceType LR_accessions LR_FFPM JunctionReadCount SpanningFragCount
est_J est_S LeftGene_SR RightGene_SR LargeAnchorSupport NumCounterFusionLeft NumCounterFusionRight FAR_left FAR_right LeftBreakDinuc LeftBreakEntropy RightBreakDinuc RightBreakEntropy FFPM microh_brkpt_dist num_microh_near_brkpt annots max_LR_FFPM frac_dom_iso above_frac_dom_iso
THRA--AC090627.1 1.0 THRA 11793 chr17:40086853:+ AC090627.1 21580 chr17:48294347:+ ONLY_REF_SPLICE TRINITY_DN45_c0_g1_i1 107.759 92.0 102.0 92.0 98.63 THRA^ENSG00000126351.11 AC090627.1^ENSG00000235300.4 YES 28.0 12.0 6.72 15.0 GT 1.8892 AG 1.9656 8.8952 3112.0 0.0 ["INTRACHROMOSOMAL[chr17:8.20Mb]"] 107.759 1.0 True
ACACA--STAC2 1.0 ACACA 64929 chr17:37122531:- STAC2 79395 chr17:39218173:- ONLY_REF_SPLICE TRINITY_DN2714_c0_g1_i1 107.759 55.0 44.0 55.0 44.0 ACACA^ENSG00000278540.3 STAC2^ENSG00000141750.6 YES 255.0 6.0 0.39 14.29 GT 1.9656 AG 1.9656 4.6195 3457.0 0.0 ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:1.80Mb]"] 107.759 1.0 True
RPS6KB1--SNF8 1.0 RPS6KB1 1280 chr17:59893325:+ SNF8 26129 chr17:48943975:- ONLY_REF_SPLICE TRINITY_DN378_c0_g3_i1 107.759 37.0 49.0 37.0 47.71 RPS6KB1^ENSG00000108443.12 SNF8^ENSG00000159210.8 YES 115.0 570.0 0.75 0.15 GT 1.3753 AG 1.8323 3.9528 1796.0 0.0 ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:10.95Mb]"] 107.759 1.0
True
VAPB--IKZF3 1.0 VAPB 1396 chr20:58389517:+ IKZF3 28254 chr17:39777767:- ONLY_REF_SPLICE TRINITY_DN229_c0_g1_i1 107.759 21.0 40.0 21.0 27.56 VAPB^ENSG00000124164.14 IKZF3^ENSG00000161405.15 YES 399.0 12.0 0.15 4.77 GT 1.9656 AG 1.7819 2.2659 1848.0 0.0 ["Klijn_CellLines","DEEPEST2019","ChimerPub","ChimerSeq","CCLE_StarF2019","INTERCHROMOSOMAL[chr20--chr17]"] 107.759 1.0 True
MED1--STXBP4 1.0 MED1 1249 chr17:39451038:- STXBP4 44835 chr17:55141310:+ ONLY_REF_SPLICE TRINITY_DN2790_c0_g1_i1 107.759 13.0 15.0 13.0 15.0 MED1^ENSG00000125686.10 STXBP4^ENSG00000166263.12 YES 249.0 11.0 0.12 2.42 GT 1.3996 AG 1.7968 1.3065 1519.0 0.0 ["Klijn_CellLines","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:15.52Mb]"] 107.759 1.0 True
AHCTF1--NAAA 1.0 AHCTF1 1401 chr1:246931578:- NAAA 50972 chr4:75925811:- ONLY_REF_SPLICE TRINITY_DN4758_c0_g1_i1 107.759 8.0 32.0 8.0 25.58 AHCTF1^ENSG00000153207.13 NAAA^ENSG00000138744.13 YES 27.0 67.0 1.46 0.6 GT 1.7232 AG 1.8062 1.5669 2677.0 0.0 ["CCLE_StarF2019","INTERCHROMOSOMAL[chr1--chr4]"] 107.759 1.0 True
MED1--ACSF2 1.0 MED1 6915 chr17:39439165:- ACSF2 36449 chr17:50471028:+ ONLY_REF_SPLICE TRINITY_DN422_c0_g1_i1 107.759 10.0 11.0 10.0 11.0 MED1^ENSG00000125686.10 ACSF2^ENSG00000167107.11 YES 277.0 250.0 0.08 0.09 GT 1.9656 AG 1.9656 0.9799 2386.0 0.0 ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:10.97Mb]"] 107.759 1.0 True
STX16--RAE1 1.0 STX16 1835 chr20:58652087:+ RAE1 18185 chr20:57354032:+ ONLY_REF_SPLICE TRINITY_DN89_c0_g1_i1 107.759 7.0 29.0 7.0 14.5 STX16^ENSG00000124222.20 RAE1^ENSG00000101146.11 YES 227.0 506.0 0.16 0.07 GT 1.9899 AG 1.9656 1.0032 2394.0 0.0 ["CCLE_StarF2019","INTRACHROMOSOMAL[chr20:1.27Mb]"] 107.759 1.0 True
STARD3--DOK5 1.0 STARD3 1167 chr17:39637231:+ DOK5 23084 chr20:54643458:+ ONLY_REF_SPLICE TRINITY_DN4629_c0_g1_i1 107.759 7.0 6.0 7.0 6.0 STARD3^ENSG00000131748.14 DOK5^ENSG00000101134.10 YES 547.0 0.0 0.03 14.0 GT 1.8892 AG 1.9656 0.6066 3274.0 0.0 ["CCLE_StarF2019","INTERCHROMOSOMAL[chr17--chr20]"] 107.759 1.0 True
SKA2--MYO19 1.0 SKA2 1139 chr17:59155131:- MYO19 29653 chr17:36507512:- ONLY_REF_SPLICE TRINITY_DN129_c0_g1_i2 107.759 5.0 6.0 5.0 2.14 SKA2^ENSG00000182628.11 MYO19^ENSG00000278259.3 YES 172.0 111.0 0.07 0.11 GT 1.9086 AG 1.9086 0.3332 773.0 0.0 ["Klijn_CellLines","ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:22.57Mb]"] 107.759 1.0 True
MED13--BCAS3 1.0 MED13 3547 chr17:62052537:- BCAS3 94863 chr17:61391977:+ ONLY_REF_SPLICE TRINITY_DN142_c0_g1_i1 107.759 2.0 3.0 2.0 3.0 MED13^ENSG00000108510.8 BCAS3^ENSG00000141376.19 YES 16.0 69.0 0.35 0.09 GT 1.5546 AG 1.9086 0.2333 1338.0 0.0 ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:0.55Mb]"] 107.759 1.0 True
TRIM37--MYO19 1.0 TRIM37 7217 chr17:59084002:- MYO19 52575 chr17:36507924:- ONLY_REF_SPLICE TRINITY_DN129_c0_g1_i1 107.759 2.0 3.0 2.0 2.67 TRIM37^ENSG00000108395.12 MYO19^ENSG00000278259.3 NO 107.0 73.0 0.06 0.08 GT 1.7465 AG 1.7819 0.2179 3078.0 0.0 ["Klijn_CellLines","CCLE_StarF2019","INTRACHROMOSOMAL[chr17:22.44Mb]"] 107.759 1.0 True
PIP4K2B--RAD51C 1.0 PIP4K2B 9121 chr17:38777687:- RAD51C 32535 chr17:58732484:+ ONLY_REF_SPLICE TRINITY_DN279_c0_g1_i2 107.759 2.0 2.0 2.0 0.62 PIP4K2B^ENSG00000276293.3 RAD51C^ENSG00000108384.13 YES 451.0 99.0 0.01 0.05 GT 1.7968 AG 1.9329 0.1222 2067.0 0.0 ["CCLE_StarF2019","INTRACHROMOSOMAL[chr17:19.89Mb]"] 107.759 1.0 True
TRPC4AP--MRPL45 1.0 TRPC4AP 2387 chr20:35078046:- MRPL45 30344 chr17:38322126:+ ONLY_REF_SPLICE TRINITY_DN35_c0_g2_i1 107.759 2.0 2.0 2.0 0.57 TRPC4AP^ENSG00000100991.10 MRPL45^ENSG00000278845.3 YES 368.0 726.0 0.01 0.01 GT 1.6895 AG 1.9086 0.1199 5534.0 0.0 ["Klijn_CellLines","CCLE_StarF2019","INTERCHROMOSOMAL[chr20--chr17]"] 107.759 1.0 True
DIDO1--TTI1 1.0 DIDO1 1157 chr20:62937796:- TTI1 34646 chr20:38006397:- ONLY_REF_SPLICE TRINITY_DN4033_c0_g1_i1 107.759 1.0 7.0 1.0 1.4 DIDO1^ENSG00000101191.15 TTI1^ENSG00000101407.11 NO 19.0 28.0 0.45 0.31 GT 1.6402 AG 1.8892 0.112 2916.0 0.0 ["ChimerSeq","CCLE_StarF2019","INTRACHROMOSOMAL[chr20:24.84Mb]"] 107.759 1.0 True
A preliminary fusion report will also be included. The final fusions are the subset of the preliminary fusions that match perfectly with reference gene exon annotations at the fusion junction breakpoint.
In addition to the fusion report, you will have access to a Trinity.fasta file containing the de novo assembled transcripts. This can be used for further downstream analyses, such as exploring potential foreign transcripts (eg. tumor viruses, microbes, etc.)
Example data are available for exploring the different execution modes of TrinityFusion.
Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Haas, Brian J.; Dobin, Alexander; Li, Bo; Stransky, Nicolas; Pochet, Nathalie; Regev, Aviv; Genome Biology; 2013 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1842-9
Contact us via our google group: https://groups.google.com/forum/#!forum/trinity_ctat_users