Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition_into_regions returns empty region for all contigs in header #1200

Closed
jeromekelleher opened this issue Feb 26, 2024 · 5 comments
Closed

Comments

@jeromekelleher
Copy link
Collaborator

This line adds an empty region for every contig that's in the VCFs header in partition_into_regions. However, it's common for VCFs to have contigs declared in the header that are not in the file at all. For example, recent 1000 Genomes data declares all contigs in all VCF files (there are thousands). This generates a lot of noise in the returned region strings.

What was the rationale for returning the empty regions @tomwhite?

@jeromekelleher
Copy link
Collaborator Author

Disabling doesn't break any tests as it was pragma: no covered

@jeromekelleher
Copy link
Collaborator Author

Possible explanation for #1169 ?

@tomwhite
Copy link
Collaborator

What was the rationale for returning the empty regions @tomwhite?

Not sure. It seems it can be removed though.

@jeromekelleher
Copy link
Collaborator Author

Digging deeper, it's not quite as simple as this. Turn out that all three of CSI indexed BCF, CSI indexed VCF and tabix indexed VCF need to be treated slightly differently in these cases in which there are multiple contigs defined in the header 🤮

@jeromekelleher
Copy link
Collaborator Author

jeromekelleher commented Mar 3, 2024

Closing in favour of #1202 (more precisely specified)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants