This is a web-app and workflow for computationally screening PCR-based assays against the increasing number of SARS-CoV-2 sequences being deposited for public release in GISAID and GenBank. Assays are assessed for binding based on free energy and melting temperature to determine whether binding occurs between the assay oligonucleotides (primers and probe), and target sequence.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
NOTE: The program is designed to retrieve genomes from GISAID and/or NCBI for the users' convenience. We have removed any GISAID data from our web-app per GISAID's request since September 2020.
We provide a package file (environment.yml) to create a new environment (assay_val) using conda:
$ git clone https://github.com/LANL-Bioinformatics/assay_validation.git
$ cd assay_validation/
$ conda env create -f environment.yml
$ conda activate assay_val
The following software are required to install manually and the executable binaries should be put under your system PATH:
-
ThermonucleotideBLAST
ThermonucleotideBLAST is a software program for searching a target database of nucleic acid sequences using an assay-specific query. The detail instructions for installation can be found on software's github https://github.com/jgans/thermonucleotideBLAST.
-
PhyD3-am
This web-app includes a modified version of PhyD3 (GPL3) and visualizations specifically developed for displaying stats, metadata, phylogenetic tree and assay evaluation results. Here is a simple installation and usage:
$ cd phyd3-am/ $ npm install $ node phyd3.js
The detail information can be found on original software's github https://github.com/vibbits/phyd3.
These files must be present in the resource directory (Indicated by the -r
argument - see below):
-
"assays.txt"
A list of assays with corresponding oligo sequences. One assay to a line. Order of oligo sequences: Forward primer, Reverse primer, Probe. Separate with a space. Example line:
CDC-2019-nCoV_N1 GACCCCAAAATCAGCGAAAT TCTGGTTACTGCCAGTTGAATCTG ACCCCGCATTACGTTTGGTGGACC
-
"reduced_assays.txt"
A reduced list of assays to be used in the final results. Same format as "assays.txt". May be identical to assays.txt.
-
"del_ct_table.txt"
A table of delta Ct values from { Li B, Kadura I, Fu DJ, Watson DE. Genotyping with TaqMAMA. Genomics. 2004 Feb 1;83(2):311-20. }. Tab separated. Headers for Rows and columns. First entry: "Row". Example first two lines:
Row CC GC AC TC CG GG AG TG CA GA AA TA CT GT AT TT
CC 0.0 0.3 0.5 -0.3 6.3 17.5 19.2 11.9 5.1 15.7 11.4 12.4 0.4 11.0 10.4 3.7
The workflow mainly includes 3 major steps. All scripts mentioned below can be found in the scripts/
directory.
-
Download genomes from the two repositories and be cross-validated to remove any duplicate entries.
The
am_download.py
script will download and prepare the necessary data for running theassay_monitor.py
script. Other scripts are called which log into the GISAID database to download SARS-CoV-2 genomes as individual fasta (.fna) files. NCBI is also accessed using NCBI's e-Utilities to download fastas from GenBank. Metadata for both GISAID and GenBank are downloaded from these sources. -
Assess assays for binding based on free energy and melting temperature to determine whether binding occurs between the assay oligonucleotides (primers and probe), and target sequence.
The
assay_monitor.py
runs the assay monitor functionality for this package. Using the fasta files downloaded byam_download.py
and certain resource files (detailed below), assays (provided in a resource file) will be evaluated against each genome (previously downloaded fastas) using ThermonucleotideBLAST. Results will be output as full results inAssay_Results.json
, as a summary insummary_table.json
, and as cross referenced results inmatch_table.csv
. Summary stats are also output todb_stats.json
anddb_totals.json
. These files are used downstream in this package for producing visualizations. -
Integrate the input phylogenetic tree with the assay evaluation results, then generate essential files for visualization.
This tree visualization is rendered using a custom PhyD3 phylogenetic tree viewer. The
primer_validation_vis.py
will take an phylogenetic tree (we recommand PhaME) and the output generated fromassay_monitor.py
script to produce essential data for the web-app to<PhyD3_path>/dist/data/
. The data include heatmap-associated tree in extended phyloXML format along with results and stats of assay evaluation in individual JSON files for user's information. The heatmap displays the predicted mismatches and assay outputs for each genome of SARS-CoV-2.
We provide a script am_start.sh
to glue all the scripts and to copy essential files to the web-app directory. Please use am_start.sh -h
for more details.
usage: am_start.sh [-h] -e [STR] -m [STR] -M [STR] -s [STR] -r [STR] -R [STR] -f [STR] -t [STR] [ -D ]
Note the optional flag -D
which will skip downloads. Once you have sequences downloaded to the fasta_directory
, it is not required to download every time (except to update with new sequences).
Li, P.E., Myers y Gutiérrez, A., Davenport, K., Flynn, M., Hu, B., Lo, C.C., Jackson, E.P., Shakya, M., Xu, Y., Gans, J. and Chain, P.S., 2020. A Public Website for the Automated Assessment and Validation of SARS-CoV-2 Diagnostic PCR Assays. arXiv preprint arXiv:2006.04566.