MapVCF2NewGenome

A pipeline for mapping SNPs in a VCF file to a genome that wasn't used to call the variants. It is expected that this strategy could be used for mapping VCF files to an updated genome assembly or for mapping to another closely related species' genome (to be able to directly compare nucleotide variants). Right now this only works for SNPs, but could be modified for other variants.

Table of Contents

Requirements
Usage
License

Requirements

These scripts have been tested with Python 2.7 and 3 and should work with either. The script requires the following programs and files.

Programs:

   bwa (or equivalent alignment program to produce .bam files)
   samtools
   bcftools
   python
   pysam (python)

Files:

   vcf file that you wish to map to a new genome
   old genome file (fasta format and indexed -- samtools faidx oldGenome.fasta)
   new genome file (fasta format and indexed -- samtools faidx newGenome.fasta, bwa index newGenome.fasta)

Usage

Generate flanking sequences around the variant(s) so that they can be mapped to the new genome:

python VCF_2_Fasta_v1.1.py -vcf old.vcf.gz -fasta oldGenome.fasta -fai oldGenome.fasta.fai -flanking 100 > flankingSeqs.fasta

help (and further explanations): python VCF_2_Fasta_v1.1.py -h

     To run this step in parallel, please use version 1.2 of this script, which allows the user to specify scaffold.
     Here is an example of how this script can be used in parallel (files will need to be concatenated afterwards):

           cat test.fasta.fai | awk '{ print $1 }' | xargs -I @@ -P 2 sh -c \
           "python3 VCF_2_Fasta_v1.2.py -vcf test.vcf -fasta test.fasta -fai test.fasta.fai -chr @@ > @@.out.fasta"

Map fasta sequences to new genome:

bwa mem -M newGenome.fasta flankingSeqs.fasta | samtools sort -o newGenome.v.flankingSeqs.bam
Index alignment file:

samtools index newGenome.v.flankingSeqs.bam
Identify variant positions in new genome:

   python Bam2SNPPosition.v1.1.py -file newGenome.v.flankingSeqs.bam -qual 60 -multiple 1 -minLen 200 -minPid 99 -pos 101 > old2new.positions.txt

   help (and further explanations): python Bam2SNPPosition.v1.1.py -h
   note: requires SNPPosBamv10 in same working directory
Quality control (uses the flanking sequences generated from step 1 to check for exact sequence matching in the new genome, this script can allow the variant to be a mismatch; see help on output):

python CheckMappingVCF.v1.0.py -map old2new.positions.txt -fasta1 flankingSeqs.fasta -fai1 flankingSeqs.fasta.fai -fasta2 newGenome.fasta -fai2 newGenome.fasta.fai -flanking 20 -output most > old2new.positions.filtered.txt

help (and further explanations): python CheckMappingVCF.v1.0.py -h
Generate new vcf file:

python AssignNewSNPPositions.v1.3.py -map old2new.positions.filtered.txt -vcf old.vcf.gz -fai newGenome.fasta.fai -gen newGenome.fasta > new.vcf

help (and further explanations): python AssignNewSNPPositions.v1.3.py -h
Sort vcf file

bcftools sort new.vcf > new.sorted.vcf

License

Distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
AssignNewSNPPositions.V1.3.py		AssignNewSNPPositions.V1.3.py
Bam2SNPPosition.v1.1.py		Bam2SNPPosition.v1.1.py
CheckMappingVCF.v1.0.py		CheckMappingVCF.v1.0.py
LICENSE		LICENSE
README.md		README.md
SNPPosBamv10.py		SNPPosBamv10.py
VCF_2_Fasta_v1.1.py		VCF_2_Fasta_v1.1.py
VCF_2_Fasta_v1.2.py		VCF_2_Fasta_v1.2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapVCF2NewGenome

Requirements

Usage

License

About

Releases

Packages

Languages

License

KrisChristensen/MapVCF2NewGenome

Folders and files

Latest commit

History

Repository files navigation

MapVCF2NewGenome

Requirements

Usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages