-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in missing vs pad values in VCF #1190
Errors in missing vs pad values in VCF #1190
Conversation
Basically I think we're returning FILL values when we should be returning missing. Consider INFO/NS in the example:
We're currently returning That's my interpretation anyway - what do you think @tomwhite? I'll fix up the tests if we agree there is a bug, but that's the essence of it. |
Thanks for giving an example @jeromekelleher. I think your interpretation is correct. The NS values are simply missing, not the end of a vector that needs padding/filling. |
@jeromekelleher your interpretation looks correct to me. We should also test for a case where
With
|
c77857c
to
7a0035d
Compare
I think this is ready for a look now. There is a little duplication in the tests I've added for the sample VCF and existing ones which have been fixed up, but I think that's OK. The change is a pretty noisy one I'm afraid, as there's been a few downstream things broken as well (#1195, #1196). I've temporarily skipped those tests to make this more manageable. |
Sigh - skipping those tests pushes the required coverage below 100% so the build still fails. Any suggestions here? Temporarily push coverage down, and create an issue to track setting it back to 100? |
7a0035d
to
f7f8aed
Compare
I'm OK allowing coverage to dip as long as there is a path to get it back to 100%. |
Thanks @tomwhite - I'm happy to go with your preferred option. If we can fix the other problems fairly easily in this PR then that would be simplest. I'm just worried about doing too many changes at once. |
f7f8aed
to
57d84d1
Compare
8d6a760
to
a652a0c
Compare
OK! Looks like it's ready for a final look now @tomwhite. @timothymillar, this is a reasonably big change so would be good to get your eyes on it as well if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Temporary workaround to get tests to pass before addressing sgkit-dev#1195
Remove distinction between present with missing value, versus missing
a652a0c
to
c58bb62
Compare
Through developing the alternative implementation of vcf-to-zarr conversion in #1185 I think there's some bugs in how we're currently handling missing data. Opening this PR for discussion purposes.
There's some related issues around handling mixed ploidy calls, and string missing values, but let's leave those alone for now.
I realise now that the extra fields I added in to the simple test here duplicates later tests - but they're handy for discussion here for now anyway.