The package worked with python version >=3.8 (Tested in python 3.8, 3.9, 3.10 and 3.11). Only tested in linux x64 system.
python package:
- primer3-py>=0.6.1
- biopython>=1.7.8
other program in your $PATH:
- ncbi-blast
example code to install the packages under python3 with pip and conda
pip install primerdiffer # will also install primer3-py and biopython
conda install -c bioconda blast # install ncbi blast, which is not included in pip installation
Design genome-wide specific primers for two species/sub-species/divergent sequences:
- Greedy design primers for a region in genome1 and make a specificity check using genome2
- The dis-similarity between genome1 and genome2 >= 5%.
usage: primerdesign.py [-h] [-d WKDIR] [-g1 GENOME1] [-g2 GENOME2] [-pos POSITION] [--alignlen ALIGNLEN]
[--free3len FREE3LEN] [--productlen PRODUCTLEN] [-h1 HIT1] [-h2 HIT2]
[-i INTERVAL] [-j JUMP] [--prefix PREFIX]
optional arguments:
-h, --help show this help message and exit
-d WKDIR, --wkdir WKDIR
The dir path contain the file, if not given, use the current dir
-g1 GENOME1, --genome1 GENOME1
the fasta file used to design primer
-g2 GENOME2, --genome2 GENOME2
the fasta file used to check false priming
-pos POSITION, --position POSITION
position on genome1 to design primer, use string with IGV format, like
ChrX:1956230-1976220
--alignlen ALIGNLEN the cutoff of primer min align length as a right hit, default is 16
--free3len FREE3LEN the cutoff of primer 3' align length as a right hit, default is 2
--productlen PRODUCTLEN
the cutoff of max product which will be treated as a false priming, default is
2000.
-h1 HIT1, --hit1 HIT1
the cutoff of max number of in-silicon PCR product which can be found in
genome1. default is 1
-h2 HIT2, --hit2 HIT2
the cutoff of max number of in-silicon PCR product which can be found in
genome2, default is 0
-i INTERVAL, --interval INTERVAL
interval is the region begins to pick primers, default is 5000. If 5k is the
unit, will pick one primer each 5k
-j JUMP, --jump JUMP jump is the region to jump inside intervals if the prvious interval can not
generate a valid primer, the smaller, more sites to check. Default is 500.
--prefix PREFIX prefix of output file, default is primers
Use C. nigoni and C.briggsae genomes as example. The two fasta files can be downloaded separately from cb5.fa and cn3_new.fa.
The C. nigoni genome is cn3_new.fa and C. briggsae genome is cb5.fa. To design C. briggsae unique primer, which would not amplify any region in C. nigoni, and amplify only one region in C. briggsae. The targeted region for C. briggsae is ChrX:12881200-15106660 (-pos or --position), one primer is designed for every 4kb interval (-i or --interval).
primerdesign.py -g1 cb5.fa -g2 cn3_new.fa -pos "ChrX:12881200-15106660" --interval 4000
# check the result in file "primers_ChrX_12881200-15106660.txt"
head primers_ChrX_12881200-15106660.txt
#ChrX:12881200-12881700 GATCCAAAACATGAGTGGCC CGAGATCATTGGCTCAAAGT 287
#ChrX:12886200-12886700 GTTTTCTCTTCAAGTGCCCG CTCCCACATCTTGTAGGTCC 416
#ChrX:12891200-12891700 GTAGATGCTGTTGAGGCTCT CGAGTGGGACATTGTCAGTA 299
#ChrX:12896200-12896700 GGCGCATTATACGAAGCTTT TTCCTGCTGCCAGATAGAAG 362
Case example: use getpos_primers.py to annotate the primerdiffer.py output with position and in silicon PCR result
usage: getpos_primers.py [-h] [-d WKDIR] [-f FILE] [-g GENOME] [--alignlen ALIGNLEN] [--free3len FREE3LEN] [--productlen PRODUCTLEN] [-o OUT]
optional arguments:
-h, --help show this help message and exit
-d WKDIR, --wkdir WKDIR
The dir path contain the file, if not given, use the current dir
-f FILE, --file FILE the 4 col table generated by primerdesign.py
-g GENOME, --genome GENOME
the fasta file used to design primer
--alignlen ALIGNLEN the cutoff of primer min align length as a right hit, default is 16
--free3len FREE3LEN the cutoff of primer 3' align length as a right hit, default is 2
--productlen PRODUCTLEN the cutoff of max product which will be treated as a false priming, default is 2000.
-o OUT, --out OUT output file, contains possible amplification regions of this primer pair
# example: first check the input file format, the col4 is from the primerdesign.py output
head primers_ChrX_12881200-15106660.txt
#ChrX:12881200-12881700 GATCCAAAACATGAGTGGCC CGAGATCATTGGCTCAAAGT 287
# run get_posprimers.py
getpos_primers.py -f primers_ChrX_12881200-15106660.txt -g cb5.fa -o primer_pos.txt
# the output is 6 col table
#ChrX:12881200-12881700 GATCCAAAACATGAGTGGCC CGAGATCATTGGCTCAAAGT 287 ChrX:12881301-12881587 ATCCAAAACATGAG....
Use in silico PCR to get the position and the product of one primer
usage: ispcr.py [-h] [-d WKDIR] [-f FORWARD] [-r REVERSE] [-g GENOME] [--alignlen ALIGNLEN]
[--free3len FREE3LEN] [--productlen PRODUCTLEN] [-o OUT]
optional arguments:
-h, --help show this help message and exit
-d WKDIR, --wkdir WKDIR
The dir path contain the file, if not given, use the current dir
-f FORWARD, --forward FORWARD
the forward primer sequence
-r REVERSE, --reverse REVERSE
the reverse primer sequence
-g GENOME, --genome GENOME
the fasta file used to design primer
--alignlen ALIGNLEN the cutoff of primer min align length as a right hit, default is 16
--free3len FREE3LEN the cutoff of primer 3' align length as a right hit, default is 2
--productlen PRODUCTLEN
the cutoff of max product which will be treated as a false priming, default is
2000.
-o OUT, --out OUT output file, contains all possible amplification regions of this primer pair
ispcr.py -f gcactttcatgtccctcaac -r cactctattctcaccccacc -g cb5.fa -o ispcr.fa
# check the PCR product in ispcr.fa
head ispcr.fa
#>ChrI:230699-231076_RC
#GCACTTTCATGTCCCTCAACCAGTCGTTTTTCCTTACCTCTCCCCTTCCTTTTTTCCCCCTCCCAGATGACGTCACCCATCTGTCC
#ACCCTTCTAACGGTCCCCCCCACCATCTGCATGGTGTCCTCGGGGGTGAACAGCTGCACATTTATTGTTCCCTTCTATTCCCCCCT
#CCTCGGTCATCGCGGTTTTATCCCCGCCGTCCATTTGACCATTCTTTGCGTCTCTTCCCTCTCTCTCTCTCTCTCTCTCTCTCTCT
#CTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTCTTTTTCTTCTAGGCGAAAGAGGTCACATGGAAGAGAAGAGGATG
#ATGATGATGATGATGGTGGGGTGAGAATAGAGT
The default primer design parameter is described in general_setting.py:
primer3_general_settings = {
'PRIMER_OPT_SIZE': 20,
'PRIMER_PICK_LEFT_PRIMER':1,
'PRIMER_PICK_RIGHT_PRIMER':1,
'PRIMER_PICK_INTERNAL_OLIGO': 0,
'PRIMER_MIN_SIZE': 18,
'PRIMER_MAX_SIZE': 23,
'PRIMER_OPT_TM': 57.0,
'PRIMER_MIN_TM': 46.0,
'PRIMER_MAX_TM': 63.0,
'PRIMER_MIN_GC': 20.0,
'PRIMER_MAX_GC': 80.0,
'PRIMER_PRODUCT_SIZE_RANGE': [[250, 650]],
'PRIMER_NUM_RETURN':10,
'PRIMER_MIN_THREE_PRIME_DISTANCE':10,
'PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS':0
}
The new parameter can be supplemnted as a file with the terms which need to be changed, for example, create a file with name of config.txt
# content of config.txt
# only changed the product size, the others are the same the default
{'PRIMER_PRODUCT_SIZE_RANGE': [[100, 200]]}
Run primerdesign.py to design a qPCR primer(product size 100-200bp, use the config.txt for primer3config) for C. briggsae mitochondrial DNA sequence NC_009885.1 in any region (use 0:-1 means from the first nucleotide to the last); check the C. briggsae genome cb5.fa for false priming, make sure the hit is 0.
primerdesign.py -g1 NC_009885.fa -g2 cb5.fa -pos "NC_009885.1:0:-1" \
--interval 1000 --hit1 1 --hit2 0 \
--primer3config config.txt
Please kindly cite our paper in bioinformatics if you find primerdiffer or primervcf useful in your work: Link to publication.
Ren, X., Shao, Y., Zhang, Y., Ni, Y., Bi, Y., & Li, R. (2023). Primerdiffer: a python command line module for large-scale primer design in haplotype genotyping. Bioinformatics, btad188.
- To design the primer using VCF file for closely related haplotypes (i.e., strain/individual level differences). Please check primervcf package.
- add the primerdffer output table description and ispcr function for the output.