primed-file-conversion

PRIMED file conversion workflows

plink2_bed2vcf

This workflow uses plink2 to convert a file from binary PLINK format (bed/bim/fam) to VCF.

Default behavior is to output SNPs only, omitting any "I/D" codes for indels, as these are not accepted by downstream workflows such as liftover and imputation.

Any pseudoautomsomal SNPs ('XY' code) will be merged with the X chromosome using plink2's "merge-x" option. The default is to add 'chr' prefixes to chromosome codes, as this is the standard for hg19 and hg38 and facilitates using UCSC chain files for liftover.

If using a fasta file, run the workflow with reference disks enabled.

Inputs:

input	description
bed_file	plink bed file
bim_file	plink bim file
fam_file	plink fam file
snps_only	(optional, default true) boolean for whether to filter output file to SNPs only
chr_prefix	(optional, default true) boolean for whether to add a 'chr' prefix, e.g. chr1, chr2, chrX vs 1, 2, X
fasta_file	(optional) fasta file. If provided, plink2 attempts to assign ref and alt alleles according to the reference genome.
out_prefix	(optional) prefix for output vcf file. If not provided, taken from the input bed filename.

Outputs:

output	description
out_file	VCF file
md5sum	md5 checksum of out_file

plink2_pgen2bed

This workflow uses plink2 to convert a file from PLINK2 format (pgen/pvar/psam) to binary PLINK format (bed/bim/fam).

Inputs:

input	description
pgen	plink2 pgen file
pvar	plink2 pvar file
psam	plink2 psam file
out_prefix	(optional) prefix for output bed/bim/fam files. If not provided, taken from the input pgen filename.

Outputs:

output	description
out_bed	bed file
out_bim	bim file
out_fam	fam file
md5sum	md5 checksums of out_bed, out_bim, out_fam

plink2_pgen2vcf

This workflow uses plink2 to convert a file from PLINK2 format (pgen/pvar/psam) to VCF.

Inputs:

input	description
pgen	plink2 pgen file
pvar	plink2 pvar file
psam	plink2 psam file
out_prefix	(optional) prefix for output bed/bim/fam files. If not provided, taken from the input pgen filename.

Outputs:

output	description
out_file	VCF file
md5sum	md5 checksum of out_file

plink2_vcf2bed

This workflow uses plink2 to convert a file from Variant Call Format (VCF) to binary PLINK format (bed/bim/fam).

Inputs:

input	description
vcf_file	vcf file
out_prefix	(optional) prefix for output bed/bim/fam files. If not provided, taken from the input vcf filename.

Outputs:

output	description
out_bed	bed file
out_bim	bim file
out_fam	fam file
md5sum	md5 checksums of out_bed, out_bim, out_fam

plink2_vcf2pgen

This workflow uses plink2 to convert a file from Variant Call Format (VCF) to binary PLINK2 format (pgen/pvar/psam).

Inputs:

input	description
vcf_file	vcf file
out_prefix	(optional) prefix for output bed/bim/fam files. If not provided, taken from the input vcf filename.

Outputs:

output	description
out_pgen	pgen file
out_pvar	pvar file
out_psam	psam file
md5sum	md5 checksums of out_pgen, out_pvar, out_psam

liftover_vcf

This workflow uses GATK Picard to lift over VCF files from one build to another. Run the workflow with reference disks enabled.

After Picard is run, a strand flip (using plink v1.9 --flip) will be run on the rejected SNPs and liftover will be re-tried. Any SNPs successfully lifted over after the strand flip will be merged with the prior lifted file.

Human genome reference builds

Build 37 vs hg19 explained

Reference fasta files on Google Cloud Storage

Chain files:

hg17 to hg38
hg18 to hg38
hg19 to hg38
b37 to hg38 - use this file if the input contigs omit the 'chr' prefix

Inputs:

input	description
vcf_file	VCF file
chain_url	URL for chain file
target_fasta	fasta file with referce sequence for target build
out_prefix	prefix for output file (.vcf.gz will be appended)
mem_gb	(optional, default 16 GB) RAM required for liftover. If the job fails due to lack of memory, try setting this to a larger value.

Outputs:

output	description
out_file	VCF file with coordinates in target build
md5sum	md5 checksum of out_file
rejects_file	VCF file with variants that could not be lifted over
num_rejects	number of variants in the rejects file

bcftools_merge

This workflow uses bcftools to merge VCFs into a single VCF. Before merging, it creates an index for each VCF. It can run in parallel for multiple sets of VCFs.

Inputs:

input	description
vcf_files	An array of arrays of VCF files to merge. Each array of VCF files will be merged into a single VCF file.
output_prefixes	Array of output prefixes for the merged VCF files. This should be an array of the same length as vcf_files.
missing_to_ref	Set genotypes at missing sites to the reference allele (0/0). Default is false.
merge_options	(optional) if specified, additional options to pass to `bcftools merge`
mem_gb	(optional, default 16 GB) RAM required for merging. If the job fails due to lack of memory, try setting this to a larger value.

Outputs:

output	description
out_file	Array of merged VCF files, same length as vcf_files
out_index_file	Array of index files for the merged VCF files, same length as vcf_files

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
.dockstore.yml		.dockstore.yml
README.md		README.md
bcftools_merge.json		bcftools_merge.json
bcftools_merge.wdl		bcftools_merge.wdl
bcftools_merge_overlap.wdl		bcftools_merge_overlap.wdl
crossmap.json		crossmap.json
crossmap.wdl		crossmap.wdl
liftover_vcf_picard.json		liftover_vcf_picard.json
liftover_vcf_picard.wdl		liftover_vcf_picard.wdl
plink2_bed2vcf.json		plink2_bed2vcf.json
plink2_bed2vcf.wdl		plink2_bed2vcf.wdl
plink2_pgen2bed.json		plink2_pgen2bed.json
plink2_pgen2bed.wdl		plink2_pgen2bed.wdl
plink2_pgen2vcf.json		plink2_pgen2vcf.json
plink2_pgen2vcf.wdl		plink2_pgen2vcf.wdl
plink2_vcf2bed.json		plink2_vcf2bed.json
plink2_vcf2bed.wdl		plink2_vcf2bed.wdl
plink2_vcf2pgen.json		plink2_vcf2pgen.json
plink2_vcf2pgen.wdl		plink2_vcf2pgen.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

primed-file-conversion

plink2_bed2vcf

plink2_pgen2bed

plink2_pgen2vcf

plink2_vcf2bed

plink2_vcf2pgen

liftover_vcf

bcftools_merge

About

Releases

Packages

Contributors 2

Languages

UW-GAC/primed-file-conversion

Folders and files

Latest commit

History

Repository files navigation

primed-file-conversion

plink2_bed2vcf

plink2_pgen2bed

plink2_pgen2vcf

plink2_vcf2bed

plink2_vcf2pgen

liftover_vcf

bcftools_merge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages