Introme is an in silico splice predictor which evaluates a variant’s likelihood of altering splicing by combining predictions from multiple splice-scoring tools, combined with additional splicing rules, and gene architecture features. Introme can accurately predict the impact of coding and noncoding variants on splicing through investigating for the potential damage, creation or strengthening of splice elements and outperforms all leading tools that we tested.
Introme source code is provided under the GPLv3 license. Introme combines splicing scores from several tools and third party packages provided under open source licenses, please see NOTICE for additional details. Introme is free for academic and non-commercial use. All other use requires a commercial license from Children's Cancer Institute, and potentially a commercial SpliceAI license obtained from Illumina, Inc.
- Docker
- vcfanno
- spliceai
- bedtools
- bcftools
- samtools
- htslib
- R
- R packages: ROCR, caret
- python3
- python packages: pysam, csv, Bio.Seq, argparse
Introme requires the following files to be downloaded and placed in the annotations folder in addition to the files present in this repository.
- CADD v1.3 VCF created using the following instructions
- SPIDEX v1.0
- dbscSNV v1.1
- A vcf file of variants to analyse
- A gtf file, ideally containing only protein coding regions to speed up annotations (we recommend gencode)
- A reference genome fasta file
The wdl script is labelled Introme.wdl in the wdl_scripts folder. These scripts were set up for implementation using Terra. All of the annotation files are required to be in the same folder, and specified as inputs to ensure proper annotation using vcfanno, these requirements will be further documented in the folder.
- Install the above software requirements and pull Introme.
- Download the required annotation files and file requirements and place them in the annotations folder.
- Update the .conf files with the correct paths (shouldn't be necessary if the same annotation files are used).
- Pull MMSplice and Spliceogen into the Introme folder from the links provided (ensure the original docker files are not deleted).
- Build the docker containers for MMSplice and Spliceogen using the code below. If you tag the containers differently, ensure you update the
docker run
section in the run_introme.sh script.
cd MMSplice
docker build -t mmsplice .
cd Spliceogen
docker build -t spliceogen .
Note: The MMSplice Docker Container requires more memory than the standard settings for Docker. Upgrade the memory to 10GB to ensure it runs.
- Run introme using the run command
./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix
A more streamlined install of introme for running locally is being developed using Docker.
Introme can be run using either a local installation, or Docker.
Furthermore, we have wrapped Introme in Workflow Description Language and implemented using Terra. We are currently in the process of implementing Introme using CAVATICA, which uses the SevenBridges Genomics platform.
g
Input GTF file (ideally gencode)p
Output file prefixr
Reference genomev
Input VCF file
a
Genome assembly (can be inferred from genome build if in the file name)b
Input BED file (i.e. regions of interest)f
Score all variants ≤ a specified variant allele frequencyq
Score all variants regardless of quality scores
Turn off Introme single score check
Run Introme with base parameters:
./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix
Run Introme on a specified gene list (BED format) for variants below 0.1% allele frequency:
./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix -b genelist.bed -f 0.001
The variant-level scores and supporting information are then fed into the Introme decision tree model to classify the likelihood of a variant altering splicing, which produces an Introme score from 0–1. We recommend the use of 0.61 as a threshold, producing a sensitivity of 0.91 and a specificity of 0.91, calculated on the validation dataset. When high specificity is required, a threshold of 0.83 results in a sensitivity of 0.8 and a specificity of 0.975.
We are working on implementing automatic interpretation for the outcome of the splice-altering variant. Until this feature is in place, all of the input scores which make up Introme's final prediction are included in the final .tsv file if further information on the variant prediction is required.
Introme currently supports VCF files aligned to the both GRCh37 and GRCh38 reference genomes. Please specify using -a 'hg19/hg38' if your reference genome is not specified in the name of the fasta file.
The development of Introme has been supported grants, fellowships and scholarships provided by:
- Luminesce Alliance
- Cancer Australia and My Room
- NHMRC
- NSW Health
- Australian Government Research Training Program
- The Kids Cancer Alliance
- Petre Foundation
- Fulbright Future Scholarship
Introme was initially developed by Dr. Mark Cowley, Dr. Velimir Gayevskiy and Dr. Sarah Beecroft at the Garvan Institute's Kinghorn Centre for Clinical Genomics, and the initial implementation can be found at KCCG's Introme Repository.
Introme has since been adapted and reimplemented by Patricia Sullivan, Dr. Mark Cowley and Dr. Mark Pinese at the Children's Cancer Institute. This version extends on KCCG's Introme in terms of accuracy, the addition of mulitple splice-scoring tools, and the use of machine learning.
For additional questions or assistance using Introme, contact [email protected] (Patricia Sullivan).