Skip to content

Background variants

martinghunt edited this page Apr 12, 2022 · 3 revisions

This page describes the use of "background" variants when building your own panels for AMR and/or lineage calling. This can be essential, depending on the species of interest, to prevent missed calls.

tl; dr

If you are missing calls from mykrobe after making your own panel, the most likely reason is that background SNPs must be used. Read this page to understand and learn how to fix the problem.

Overview

Mykrobe works by looking for perfectly matching kmers. Suppose we have a variant of interest C100T (a C to T DNA change at position 100 in the genome). If a sample has a SNP within a kmer of position 100, say G95T, then this prevents kmers from matching, resulting in a false negative call from mykrobe. Mykrobe solves this problem by allowing you to supply a catalog of "background variants" that are used when running make-probes. Mykrobe will then generate combinations of probes that have (or do not) have the background variants. For example, supplying the G95T variant would result in two probes for C100T: one that has a G at position 95 and the other a T at position 95.

TB background SNP files

If you are building a panel for TB, then you can skip the part below about making your own VCF files. Instead, get VCF files with background variants from here: https://figshare.com/articles/dataset/Mykrobe_TB_panel_background_variants/19582597.

Preparing background SNP files

You will need mongoDB installed, and your background SNPs in one or more VCF files. The idea is that the SNPs are added to a database, which is then used when running make-probes.

The VCF file must have the GT and GT_CONF fields present in every line. Only records with a non-reference genotype and with GT_CONF > 1 are used (there may be other requirements - to be documented). Here is an example of a record that will be used:

ref_name       42    12      G       T       255     PASS    SVTYPE=SNP      GT:GT_CONF      1/1:100

That VCF record would add the variant G42T to the backgrounds database.

Make probes

You will need mongoDB running in the background, and then run mykrobes variants add once for each VCF file to add background variants. Here are example commands, where the variants are in variants.vcf:

# Start the database
ref_fa=NC_012345.fasta
db_name=my_db
db=$PWD/mongo-db/
mkdir $db
mongod --quiet --dbpath $db &
sleep 5

# Add variants. Run this command once for each VCF file.
# Note: you can put anything in the -m option, it is just
# the name of the source of the variants. Here we put 'samtools'
mykrobe variants add -f --db_name $db_name variants.vcf $ref_fa -m samtools

# Run make probes - note the --db_name option
mykrobe variants make-probes \
  --db_name $db_name \
  -k21 \
  -t amino_acid_variants.txt \
  -g NC_012345.gbk \
  $ref_f > probes.fa

mongod --shutdown --dbpath $db

The final make-probes command is described in detail in the custom panels help page. Essentially, use it as described there, but add in the --db_name foo option to include background variants.