Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eror combining vcf files #162

Open
assafgrw opened this issue Sep 7, 2017 · 8 comments
Open

Eror combining vcf files #162

assafgrw opened this issue Sep 7, 2017 · 8 comments
Assignees
Labels

Comments

@assafgrw
Copy link

assafgrw commented Sep 7, 2017

Hello,

I am trying to generate Mendelian error statistics for trio analysis i do. According to Seqmule guide I should first combine the vcf of the mother father and son. So I figured it would make most sense to combine the consensus vcf of each.
When using the command "./seqmule stats --u-vcf father.vcf, mother.vcf, son.vcf -p 123combo -ref hg19.fa"
I got an error which says that the reference does not exist so I used human_g1k_v37_fasta as reference:
"./seqmule stats --u-vcf father.vcf, mother.vcf, son.vcf -p 123combo -ref human_g1k_v37.fasta" but get the following eror:

"ERROR: not all VCFs have the same set of samples (case-sensitive)!"

Any suggestions?
Kind regards
Assas

@assafgrw
Copy link
Author

assafgrw commented Sep 7, 2017

I was also trying to run it now only on the freebayes vcf's and got the same error

@yunfeiguo
Copy link
Collaborator

@assafgrw thanks for reporting the issue, I will address it soon.

@yunfeiguo yunfeiguo added the bug label Sep 7, 2017
@yunfeiguo yunfeiguo self-assigned this Sep 7, 2017
@assafgrw
Copy link
Author

assafgrw commented Sep 7, 2017

thanks

@yusmile0618
Copy link

I also meet this error.
ERROR: not all VCFs have the same set of samples (case-sensitive)!
Is this error fixed?
After using GATK combinevariants. I can got the mendel_stat results, but I don't know the threshold of Proportion of Mendelian errors. Where can I find these infomation ?
thanks.

@yunfeiguo
Copy link
Collaborator

Hi @yusmile0618,

are you running with --u-vcf or --c-vcf? which VCFs are you using? these 2 options are for merging VCFs of the same sample.

Mendelian errors, in ideal scenario, should be zero. So the lower it is the better. Thanks.

@yusmile0618
Copy link

thanks @yunfeiguo
seqmule stats --u-vcf son.extract_consensus.vcf,mother.extract_consensus.vcf,father.extract_consensus.vcf -p fam -ref pathtoSeqMule/database/human_g1k_v37.fasta
both --u-vcf, --c-vcf, -u-vcf, -c-vcf get the ERROR: not all VCFs have the same set of samples (case-sensitive)!

the vcfs files are all generated from the command
seqmule pipeline -a R1.fq.gz -b R2.fq.gz -e -q -t 5 -prefix son -capture pathtoSeqMule/database/hg19agilent/hg19_SureSelect_Human_All_Exon_V6_r2.bed.

Q1: how to merging sample from 3 or more different samples?
Q2: for the mendel_stat.txt file,there are three Mendelian errors, father and offspring, mother and family, which error rate threshold can be used ? 0.01 or 0.05 ?

@yunfeiguo
Copy link
Collaborator

yunfeiguo commented May 24, 2018

Hi merging VCFs from different samples is not supported by seqmule right now. the best way is use multi-sample variant calling on FASTQs from multiple samples, then each VCF will contain multiple samples. e.g.

seqmule pipeline -a fa_R1.fq.gz,mo_R1.fq.gz,son_R1.fq.gz -b fa_R2.fq.gz,mo_R2.fq.gz,son_R2.fq.gz -ms -e -q -t 36 -prefix father,mother,son -capture default

I am not aware of widely-used threshold for Mendelian error. However, it should be correlated with the sequencing depth, quality of variant calls of your analysis. I would expect Mendelian error is in the same order of magnitude as variant calling errors.

@yusmile0618
Copy link

@yunfeiguo ok, thanks for your multiple samples suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants