Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTSlib should fail on trailing INFO garbage #1253

Open
yangyxt opened this issue Mar 9, 2021 · 3 comments
Open

HTSlib should fail on trailing INFO garbage #1253

yangyxt opened this issue Mar 9, 2021 · 3 comments

Comments

@yangyxt
Copy link

yangyxt commented Mar 9, 2021

The version I use is 1.11
The command I ran is bcftools view -R <target_region>.tsv -Oz -o <output_path>.vcf.gz <input_path>.vcf.gz

The vcf file is from simulation data, the golden vcf file. And the input vcf file looks like this:
image

The output vcf file looks like this:
image

Be aware of the part marked by the red circle. The end of the row is automatically sliced out, the trailing slash and last digit.
Pls take a look at this issue and let me know how can I resolve this. Thx!

@pd3
Copy link
Member

pd3 commented Mar 9, 2021

This is partly a problem with your VCF, partly with HTSlib:

  1. the header says the WP field is an integer with Number=A values. If such, the values in the body should be comma-separated, not slash separated. Also there is wrong number of values.

  2. however, the library should fail or at least print a warning about the broken INFO record.

@pd3 pd3 transferred this issue from samtools/bcftools Mar 9, 2021
@pd3 pd3 changed the title bcftools view slice the string at the end of each row HTSlib should fail on trailing INFO garbage Mar 9, 2021
@yangyxt
Copy link
Author

yangyxt commented Mar 10, 2021

Thx for the response! In this case, how should I modify the format of my VCF file to make this right?

btw, no warning messages are given by the bcftools view

@pd3
Copy link
Member

pd3 commented Mar 16, 2021

I don't know what is the intention, but probably it would be best to redefine the tag in the header as Type=String. That way it will stay preserved.

@pd3 pd3 closed this as completed Mar 16, 2021
@pd3 pd3 reopened this Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants