-
Notifications
You must be signed in to change notification settings - Fork 1
Home
TrinityFusion leverages chimeric and unmapped reads to assemble fusion transcripts and transcripts of likely foreign origin (microbes and viruses), as a way of facilitating analysis of cancer transcriptomes.
TrinityFusion performs de novo transcriptome assembly from RNA-seq data using Trinity, and uses GMAP to identify candidate fusion transcripts. Bowtie2 is finally used to capture the reads that support the fusion, and fusion candidates are filtered according to evidence support and characteristics of the fusion gene partners. An overview of the process is illustrated below:
TrinityFusion has three execution modes:
-
TrinityFusion-C uses only chimeric reads identified by the STAR aligner for de novo assembly and subsequent fusion detection.
-
TrinityFusion-UC uses both the chimeric reads and reads that do not map to the genome as per the STAR aligner for de novo reconstruction followed by fusion detection.
-
TrinityFusion-D uses all input reads for de novo assembly followed by fusion detection.
TrinityFusion-UC has been found to be most generally useful for both fusion detection and exploring the assembled unmapped reads for potential transcripts of foreign origin, such as tumor viruses and microbes. Note, TrinityFusion-D is included for the sake of completeness, but TrinityFusion-C and TrinityFusion-UC were found far more impactful and in most cases these alternative modes should be used.
TrinityFusion can be downloaded from the TrinityFusion Releases site. Simply unpack the code and it's ready to use (no compilation necessary).
TrinityFusion is part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT), and as such, also leverages the CTAT Genome Library (as also used by STAR-Fusion and FusionInspector. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz). If you need to build a genome lib from the provided source data, see our companion CTAT Genome Lib Builder software and documentation.
If you already have a fully functioning version of STAR-Fusion installed, then you do not need to install any additional data resources. You're almost ready to hit the ground running with TrinityFusion.
After ensuring you have the GMAP software installed (see below), you must configure your CTAT Genome Lib for use with GMAP. From within the CTAT Genome Lib ctat_genome_lib_build_dir/, run the following to prep the genome for use with GMAP:
% gmap_build -D . -d ref_genome.fa.gmap -k 13 ref_genome.fa
TrinityFusion has the following software dependencies:
Also, while TrinityFusion doesn't require the STAR aligner for execution, it requires the output of STAR as one of its inputs. So, be sure to have the STAR aligner installed too.
Preferably, user our TrinityFusion Docker Image, which comes with everything it needs fully installed. The Dockerfile is included in the TrinityFusion source distribution, for those curious to see the full installation routine for building the Docker image.
Actually, before running TrinityFusion, you'll first need to run STAR in order to define the chimeric and unmapped reads:
STAR --genomeDir ${star_index_dir} \
--readFilesIn ${left_fq_filename} ${right_fq_filename} \
--twopassMode Basic \
--outReadsUnmapped None \
--chimSegmentMin 12 \
--chimJunctionOverhangMin 12 \
--alignSJDBoverhangMin 10 \
--alignMatesGapMax 100000 \
--alignIntronMax 100000 \
--chimSegmentReadGapMax 3 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--runThreadN ${THREAD_COUNT} \
--outSAMstrandField intronMotif \
--outSAMunmapped Within \
--outSAMtype BAM Unsorted \
--outSAMattrRGline ID:GRPundef \
--chimMultimapScoreRange 10 \
--chimMultimapNmax 10 \
--chimNonchimScoreDropMin 10 \
--peOverlapNbasesMin 12 \
--peOverlapMMp 0.1 \
--chimOutJunctionFormat 1 # required as of STAR v2.6.1
After running STAR, you'll have access to the STAR 'Chimeric.out.junction' and 'Alignment.bam' files. These can be used as input to TrinityFusion.
TrinityFusion usage is shown below:
% ./TrinityFusion
################################################################
#
# Required:
#
# --left_fq <string> reads_1.fq
#
# --right_fq <string> reads_2.fq
#
# (If just given the reads, runs Trinity de novo assembly first on all reads)
#
# --output_dir STR_OUT_DIR output directory
#
# Alternative TrinityFusion modes, using STAR outputs
#
# --chimeric_junctions <string> STAR Chimeric.out.junction file
#
# (if given the chimeric junctions file, restricts to the chimeric junction reads alone)
#
# --aligned_bam <string> STAR aligned bam file
#
# (if given the aligned_bam & the chimeric junctions), assembles the unmapped and chimeric reads, not all reads).
#
#
# Optional:
#
# --genome_lib_dir <string> directory for CTAT genome lib (or use env var $CTAT_GENOME_LIB
# current setting: (/Users/bhaas/DB/CTAT_GENOME_LIB/GRCh37_v19_CTAT_lib_Feb092018/ctat_genome_lib_build_dir)
# --CPU <int> :number threads (default 4)
#
# --show_full_usage_info flag, shows all options available.
#
# --version show TrinityFusion version info: 0.3.0
#
################################################################
An example TrinityFusion command leveraging both the genome aligned bam file and the chimeric junctions file (TrinityFusion-UC mode of execution) would be:
% TrinityFusion --left_fq reads_1.fq --right_fq reads_2.fq \
--chimeric_junctions Chimeric.out.junction \
--aligned_bam Aligned.bam \
--genome_lib_dir /path/to/ctat_genome_lib_build_dir/
For TrinityFusion-C mode, do not provide the Aligned.bam file. For TrinityFusion-D mode, provide just the fastq files as input.
TrinityFusion will generate a tab-delimited output file: TrinityFusion-*.fusion_predictions.tsv formatted like so:
#FusionName JunctionReadCount SpanningFragCount trans_acc trans_brkpt LeftGene LeftBreakpoint RightGene RightBreakpoint SpliceType annots
TATDN1--GSDMB 144 3 TRINITY_DN0_c0_g1_i4 61-62 TATDN1 chr8:125551266 GSDMB chr17:38066177 ONLY_REF_SPLICE ["CCLE","Klijn_CellLines","FA_CancerSupp","ChimerPub","INTERCHROMOSOMAL[chr8--chr17]"]
TATDN1--GSDMB 110 4 TRINITY_DN0_c0_g1_i1 61-62 TATDN1 chr8:125551266 GSDMB chr17:38066177 ONLY_REF_SPLICE ["CCLE","Klijn_CellLines","FA_CancerSupp","ChimerPub","INTERCHROMOSOMAL[chr8--chr17]"]
ACACA--STAC2 34 24 TRINITY_DN4_c0_g1_i1 230-231 ACACA chr17:35479453 STAC2 chr17:37374426 ONLY_REF_SPLICE ["ChimerSeq","CCLE","Klijn_CellLines","FA_CancerSupp","INTRACHROMOSOMAL[chr17:1.60Mb]"]
THRA--AC090627.1 34 1 TRINITY_DN1_c0_g1_i3 229-230 THRA chr17:38243106 AC090627.1 chr17:46371709 ONLY_REF_SPLICE ["CCLE","FA_CancerSupp","INTRACHROMOSOMAL[chr17:8.12Mb]"]
BCAS4--BCAS3 13 77 TRINITY_DN15_c0_g1_i1 253-252 BCAS4 chr20:49411710 BCAS3 chr17:59445688 ONLY_REF_SPLICE ["ChimerPub","ChimerSeq","chimerdb_pubmed","CCLE","FA_CancerSupp","INTERCHROMOSOMAL[chr20--chr17]"]
RPS6KB1--SNF8 21 22 TRINITY_DN7_c0_g1_i1 195-194 RPS6KB1 chr17:57970686 SNF8 chr17:47021337 ONLY_REF_SPLICE ["Klijn_CellLines","FA_CancerSupp","ChimerSeq","CCLE","INTRACHROMOSOMAL[chr17:10.95Mb]"]
A preliminary fusion report will also be included. The final fusions are the subset of the preliminary fusions that match perfectly with reference gene exon annotations at the fusion junction breakpoint.
In addition to the fusion report, you will have access to a Trinity.fasta file containing the de novo assembled transcripts. This can be used for further downstream analyses, such as exploring potential foreign transcripts (eg. tumor viruses, microbes, etc.)