Skip to content

Latest commit

 

History

History
197 lines (170 loc) · 9.59 KB

readme.md

File metadata and controls

197 lines (170 loc) · 9.59 KB

primerdiffer: batch design primers for chromosomal level genotyping

PyPI License

Installation:

The package worked with python version >=3.8 (Tested in python 3.8, 3.9, 3.10 and 3.11). Only tested in linux x64 system.

python package:

  • primer3-py>=0.6.1
  • biopython>=1.7.8

other program in your $PATH:

  • ncbi-blast

example code to install the packages under python3 with pip and conda

pip install primerdiffer # will also install primer3-py and biopython  
conda install -c bioconda blast # install ncbi blast, which is not included in pip installation

Case example: used primerdesign.py to design primers for two haplotypes

Design genome-wide specific primers for two species/sub-species/divergent sequences:

  • Greedy design primers for a region in genome1 and make a specificity check using genome2
  • The dis-similarity between genome1 and genome2 >= 5%.
usage: primerdesign.py [-h] [-d WKDIR] [-g1 GENOME1] [-g2 GENOME2] [-pos POSITION] [--alignlen ALIGNLEN]
                       [--free3len FREE3LEN] [--productlen PRODUCTLEN] [-h1 HIT1] [-h2 HIT2]
                       [-i INTERVAL] [-j JUMP] [--prefix PREFIX]

optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -g1 GENOME1, --genome1 GENOME1
                        the fasta file used to design primer
  -g2 GENOME2, --genome2 GENOME2
                        the fasta file used to check false priming
  -pos POSITION, --position POSITION
                        position on genome1 to design primer, use string with IGV format, like
                        ChrX:1956230-1976220
  --alignlen ALIGNLEN   the cutoff of primer min align length as a right hit, default is 16
  --free3len FREE3LEN   the cutoff of primer 3' align length as a right hit, default is 2
  --productlen PRODUCTLEN
                        the cutoff of max product which will be treated as a false priming, default is
                        2000.
  -h1 HIT1, --hit1 HIT1
                        the cutoff of max number of in-silicon PCR product which can be found in
                        genome1. default is 1
  -h2 HIT2, --hit2 HIT2
                        the cutoff of max number of in-silicon PCR product which can be found in
                        genome2, default is 0
  -i INTERVAL, --interval INTERVAL
                        interval is the region begins to pick primers, default is 5000. If 5k is the
                        unit, will pick one primer each 5k
  -j JUMP, --jump JUMP  jump is the region to jump inside intervals if the prvious interval can not
                        generate a valid primer, the smaller, more sites to check. Default is 500.
  --prefix PREFIX       prefix of output file, default is primers

Use C. nigoni and C.briggsae genomes as example. The two fasta files can be downloaded separately from cb5.fa and cn3_new.fa.

The C. nigoni genome is cn3_new.fa and C. briggsae genome is cb5.fa. To design C. briggsae unique primer, which would not amplify any region in C. nigoni, and amplify only one region in C. briggsae. The targeted region for C. briggsae is ChrX:12881200-15106660 (-pos or --position), one primer is designed for every 4kb interval (-i or --interval).

primerdesign.py -g1 cb5.fa -g2 cn3_new.fa -pos "ChrX:12881200-15106660" --interval 4000
# check the result in file "primers_ChrX_12881200-15106660.txt"
head primers_ChrX_12881200-15106660.txt
#ChrX:12881200-12881700	GATCCAAAACATGAGTGGCC	CGAGATCATTGGCTCAAAGT	287
#ChrX:12886200-12886700	GTTTTCTCTTCAAGTGCCCG	CTCCCACATCTTGTAGGTCC	416
#ChrX:12891200-12891700	GTAGATGCTGTTGAGGCTCT	CGAGTGGGACATTGTCAGTA	299
#ChrX:12896200-12896700	GGCGCATTATACGAAGCTTT	TTCCTGCTGCCAGATAGAAG	362

Case example: use getpos_primers.py to annotate the primerdiffer.py output with position and in silicon PCR result

usage: getpos_primers.py [-h] [-d WKDIR] [-f FILE] [-g GENOME] [--alignlen ALIGNLEN] [--free3len FREE3LEN] [--productlen PRODUCTLEN] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -f FILE, --file FILE  the 4 col table generated by primerdesign.py
  -g GENOME, --genome GENOME
                        the fasta file used to design primer
  --alignlen ALIGNLEN   the cutoff of primer min align length as a right hit, default is 16
  --free3len FREE3LEN   the cutoff of primer 3' align length as a right hit, default is 2
  --productlen PRODUCTLEN  the cutoff of max product which will be treated as a false priming, default is 2000.
  -o OUT, --out OUT     output file, contains possible amplification regions of this primer pair
# example:  first check the input file format, the col4 is from the primerdesign.py output
head primers_ChrX_12881200-15106660.txt 
#ChrX:12881200-12881700	GATCCAAAACATGAGTGGCC	CGAGATCATTGGCTCAAAGT	287

# run get_posprimers.py
getpos_primers.py -f primers_ChrX_12881200-15106660.txt -g cb5.fa -o primer_pos.txt

# the output is 6 col table
#ChrX:12881200-12881700	GATCCAAAACATGAGTGGCC	CGAGATCATTGGCTCAAAGT	287 ChrX:12881301-12881587  ATCCAAAACATGAG....

Case example: in silico PCR for one primer

Use in silico PCR to get the position and the product of one primer

usage: ispcr.py [-h] [-d WKDIR] [-f FORWARD] [-r REVERSE] [-g GENOME] [--alignlen ALIGNLEN]
                [--free3len FREE3LEN] [--productlen PRODUCTLEN] [-o OUT]

optional arguments:
  -h, --help            show this help message and exit
  -d WKDIR, --wkdir WKDIR
                        The dir path contain the file, if not given, use the current dir
  -f FORWARD, --forward FORWARD
                        the forward primer sequence
  -r REVERSE, --reverse REVERSE
                        the reverse primer sequence
  -g GENOME, --genome GENOME
                        the fasta file used to design primer
  --alignlen ALIGNLEN   the cutoff of primer min align length as a right hit, default is 16
  --free3len FREE3LEN   the cutoff of primer 3' align length as a right hit, default is 2
  --productlen PRODUCTLEN
                        the cutoff of max product which will be treated as a false priming, default is
                        2000.
  -o OUT, --out OUT     output file, contains all possible amplification regions of this primer pair
ispcr.py -f gcactttcatgtccctcaac -r cactctattctcaccccacc -g cb5.fa -o ispcr.fa
# check the PCR product in ispcr.fa
head ispcr.fa
#>ChrI:230699-231076_RC
#GCACTTTCATGTCCCTCAACCAGTCGTTTTTCCTTACCTCTCCCCTTCCTTTTTTCCCCCTCCCAGATGACGTCACCCATCTGTCC
#ACCCTTCTAACGGTCCCCCCCACCATCTGCATGGTGTCCTCGGGGGTGAACAGCTGCACATTTATTGTTCCCTTCTATTCCCCCCT
#CCTCGGTCATCGCGGTTTTATCCCCGCCGTCCATTTGACCATTCTTTGCGTCTCTTCCCTCTCTCTCTCTCTCTCTCTCTCTCTCT
#CTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTCTTTTTCTTCTAGGCGAAAGAGGTCACATGGAAGAGAAGAGGATG
#ATGATGATGATGATGGTGGGGTGAGAATAGAGT

Case example: Design general purpose primers with given parameters

The default primer design parameter is described in general_setting.py:

primer3_general_settings =  {
        'PRIMER_OPT_SIZE': 20,
        'PRIMER_PICK_LEFT_PRIMER':1,
        'PRIMER_PICK_RIGHT_PRIMER':1,
        'PRIMER_PICK_INTERNAL_OLIGO': 0,
        'PRIMER_MIN_SIZE': 18,
        'PRIMER_MAX_SIZE': 23,
        'PRIMER_OPT_TM': 57.0,
        'PRIMER_MIN_TM': 46.0,
        'PRIMER_MAX_TM': 63.0,
        'PRIMER_MIN_GC': 20.0,
        'PRIMER_MAX_GC': 80.0,
        'PRIMER_PRODUCT_SIZE_RANGE': [[250, 650]],
        'PRIMER_NUM_RETURN':10,
        'PRIMER_MIN_THREE_PRIME_DISTANCE':10,
        'PRIMER_LIB_AMBIGUITY_CODES_CONSENSUS':0
}

The new parameter can be supplemnted as a file with the terms which need to be changed, for example, create a file with name of config.txt

# content of config.txt
# only changed the product size, the others are the same the default
{'PRIMER_PRODUCT_SIZE_RANGE': [[100, 200]]}

Run primerdesign.py to design a qPCR primer(product size 100-200bp, use the config.txt for primer3config) for C. briggsae mitochondrial DNA sequence NC_009885.1 in any region (use 0:-1 means from the first nucleotide to the last); check the C. briggsae genome cb5.fa for false priming, make sure the hit is 0.

primerdesign.py -g1 NC_009885.fa -g2 cb5.fa -pos "NC_009885.1:0:-1" \
                --interval 1000 --hit1 1 --hit2 0 \
                --primer3config config.txt

Citation:

Please kindly cite our paper in bioinformatics if you find primerdiffer or primervcf useful in your work: Link to publication.

Ren, X., Shao, Y., Zhang, Y., Ni, Y., Bi, Y., & Li, R. (2023). Primerdiffer: a python command line module for large-scale primer design in haplotype genotyping. Bioinformatics, btad188.

Other functions:

  1. To design the primer using VCF file for closely related haplotypes (i.e., strain/individual level differences). Please check primervcf package.
  2. add the primerdffer output table description and ispcr function for the output.