Skip to content
Brian Haas edited this page May 21, 2019 · 24 revisions

TrinityFusion - Fusion and foreign transcript detection via RNA-seq de novo assembly

TrinityFusion leverages chimeric and unmapped reads to assemble fusion transcripts and transcripts of likely foreign origin (microbes and viruses), as a way of facilitating analysis of cancer transcriptomes.

TrinityFusion performs de novo transcriptome assembly from RNA-seq data using Trinity, and uses GMAP to identify candidate fusion transcripts. Bowtie2 is finally used to capture the reads that support the fusion, and fusion candidates are filtered according to evidence support and characteristics of the fusion gene partners. An overview of the process is illustrated below:

TrinityFusion has three execution modes:

  • TrinityFusion-C uses only chimeric reads identified by the STAR aligner for de novo assembly and subsequent fusion detection.

  • TrinityFusion-UC uses both the chimeric reads and reads that do not map to the genome as per the STAR aligner for de novo reconstruction followed by fusion detection.

  • TrinityFusion-D uses all input reads for de novo assembly followed by fusion detection.

TrinityFusion-UC has been found to be most generally useful for both fusion detection and exploring the assembled unmapped reads for potential transcripts of foreign origin, such as tumor viruses and microbes. Note, TrinityFusion-D is included for the sake of completeness, but TrinityFusion-C and TrinityFusion-UC were found far more impactful and in most cases these alternative modes should be used.

Installing TrinityFusion

TrinityFusion can be downloaded from the TrinityFusion Releases site. Simply unpack the code and it's ready to use (no compilation necessary).

TrinityFusion Data dependencies

TrinityFusion is part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT), and as such, also leverages the CTAT Genome Library (as also used by STAR-Fusion and FusionInspector. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz). If you need to build a genome lib from the provided source data, see our companion CTAT Genome Lib Builder software and documentation.

If you already have a fully functioning version of STAR-Fusion installed, then you do not need to install any additional data resources. You're almost ready to hit the ground running with TrinityFusion.

After ensuring you have the GMAP software installed (see below), you must configure your CTAT Genome Lib for use with GMAP. From within the CTAT Genome Lib ctat_genome_lib_build_dir/, run the following to prep the genome for use with GMAP:

% gmap_build -D . -d ref_genome.fa.gmap -k 13 ref_genome.fa

TrinityFusion software dependencies

TrinityFusion has the following software dependencies:

Also, while TrinityFusion doesn't require the STAR aligner for execution, it requires the output of STAR as one of its inputs. So, be sure to have the STAR aligner installed too.

Preferably, user our TrinityFusion Docker Image, which comes with everything it needs fully installed. The Dockerfile is included in the TrinityFusion source distribution, for those curious to see the full installation routine for building the Docker image.

Running TrinityFusion

Actually, before running TrinityFusion, you'll first need to run STAR in order to define the chimeric and unmapped reads:

 STAR --genomeDir ${star_index_dir} \                                                                                     
      --readFilesIn ${left_fq_filename} ${right_fq_filename} \                                                                      
      --twopassMode Basic \                                                                                                      
      --outReadsUnmapped None \                                                                                                  
      --chimSegmentMin 12 \                                                                                                    
      --chimJunctionOverhangMin 12 \                                                                                           
      --alignSJDBoverhangMin 10 \                                                                                              
      --alignMatesGapMax 100000 \                                                                                             
      --alignIntronMax 100000 \                                                                                                
      --chimSegmentReadGapMax 3 \                                                                                    
      --alignSJstitchMismatchNmax 5 -1 5 5 \
      --runThreadN ${THREAD_COUNT} \                                                                                                           
      --outSAMstrandField intronMotif \
      --outSAMunmapped Within \
      --outSAMtype BAM Unsorted \
      --outSAMattrRGline ID:GRPundef \
      --chimMultimapScoreRange 10 \
      --chimMultimapNmax 10 \
      --chimNonchimScoreDropMin 10 \
      --peOverlapNbasesMin 12 \
      --peOverlapMMp 0.1 \
      --chimOutJunctionFormat 1 # required as of STAR v2.6.1

After running STAR, you'll have access to the STAR 'Chimeric.out.junction' and 'Alignment.bam' files. These can be used as input to TrinityFusion.

TrinityFusion usage is shown below:

   %    ./TrinityFusion


################################################################
#
#  Required:
#
#  --left_fq <string>    reads_1.fq
#
#  --right_fq <string>   reads_2.fq
#
#  (If just given the reads, runs Trinity de novo assembly first on all reads)
#
#  --output_dir STR_OUT_DIR          output directory
#
# Alternative TrinityFusion modes, using STAR outputs
#
#  --chimeric_junctions <string>  STAR Chimeric.out.junction file
#                        
#  (if given the chimeric junctions file, restricts to the chimeric junction reads alone)
#
#  --aligned_bam <string>         STAR aligned bam file
#
#  (if given the aligned_bam & the chimeric junctions), assembles the unmapped and chimeric reads, not all reads).
#                        
#
# Optional:
#
#  --genome_lib_dir <string>  directory for CTAT genome lib  (or use env var $CTAT_GENOME_LIB
#                                      current setting: (/Users/bhaas/DB/CTAT_GENOME_LIB/GRCh37_v19_CTAT_lib_Feb092018/ctat_genome_lib_build_dir)
#  --CPU <int>                     :number threads (default 4)
#
#  --show_full_usage_info     flag, shows all options available.
#
#  --version                   show TrinityFusion version info: 0.3.0
#
################################################################

An example TrinityFusion command leveraging both the genome aligned bam file and the chimeric junctions file (TrinityFusion-UC mode of execution) would be:

%   TrinityFusion --left_fq reads_1.fq --right_fq reads_2.fq \
       --chimeric_junctions Chimeric.out.junction \
       --aligned_bam Aligned.bam \
       --genome_lib_dir /path/to/ctat_genome_lib_build_dir/

For TrinityFusion-C mode, do not provide the Aligned.bam file. For TrinityFusion-D mode, provide just the fastq files as input.

TrinityFusion output

TrinityFusion will generate a tab-delimited output file: TrinityFusion-*.fusion_predictions.tsv formatted like so:

#FusionName     JunctionReadCount       SpanningFragCount       trans_acc       trans_brkpt     LeftGene        LeftBreakpoint  RightGene       RightBreakpoint SpliceType      annots
TATDN1--GSDMB   144     3       TRINITY_DN0_c0_g1_i4    61-62   TATDN1  chr8:125551266  GSDMB   chr17:38066177  ONLY_REF_SPLICE ["CCLE","Klijn_CellLines","FA_CancerSupp","ChimerPub","INTERCHROMOSOMAL[chr8--chr17]"]
TATDN1--GSDMB   110     4       TRINITY_DN0_c0_g1_i1    61-62   TATDN1  chr8:125551266  GSDMB   chr17:38066177  ONLY_REF_SPLICE ["CCLE","Klijn_CellLines","FA_CancerSupp","ChimerPub","INTERCHROMOSOMAL[chr8--chr17]"]
ACACA--STAC2    34      24      TRINITY_DN4_c0_g1_i1    230-231 ACACA   chr17:35479453  STAC2   chr17:37374426  ONLY_REF_SPLICE ["ChimerSeq","CCLE","Klijn_CellLines","FA_CancerSupp","INTRACHROMOSOMAL[chr17:1.60Mb]"]
THRA--AC090627.1        34      1       TRINITY_DN1_c0_g1_i3    229-230 THRA    chr17:38243106  AC090627.1      chr17:46371709  ONLY_REF_SPLICE ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
BCAS4--BCAS3    13      77      TRINITY_DN15_c0_g1_i1   253-252 BCAS4   chr20:49411710  BCAS3   chr17:59445688  ONLY_REF_SPLICE ["ChimerPub","ChimerSeq","chimerdb_pubmed","CCLE","FA_CancerSupp","INTERCHROMOSOMAL[chr20--chr17]"]
RPS6KB1--SNF8   21      22      TRINITY_DN7_c0_g1_i1    195-194 RPS6KB1 chr17:57970686  SNF8    chr17:47021337  ONLY_REF_SPLICE ["Klijn_CellLines","FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr17:10.95Mb]"]

A preliminary fusion report will also be included. The final fusions are the subset of the preliminary fusions that match perfectly with reference gene exon annotations at the fusion junction breakpoint.

In addition to the fusion report, you will have access to a Trinity.fasta file containing the de novo assembled transcripts. This can be used for further downstream analyses, such as exploring potential foreign transcripts (eg. tumor viruses, microbes, etc.)

Clone this wiki locally