-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ClinVar pipeline for new XML format #1652
Conversation
f87521a
to
7d193e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff overall, I did spot some things, mainly legacy stuff that we should take this opportunity to clean up.
Both our CI steps that involve python use 3.9, and the existing pyproject.toml file also uses this version, pin this in tool versions so it exists as plaintext and so tools like asdf and mise can pick this up
As of several months ago, support for multi region buckets that supplied the VEP specific data for Hail's built in GRCh37 VEP helper became prohibitively expensive. The Hail team moved towards only supporting specific regions, further usage of this feature requires a bump to at least Hail 0.2.128
7d193e0
to
1d3d887
Compare
Heya @phildarnowsky-broad, thanks for the review. I believe I've addressed all the comments, with one new fixup commit per comment. I've tagged you for re-review, at your convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go, nice work
b1b00fb
to
d780598
Compare
ClinVar updated its release format for variants in XML with the addition of somatic variant classifications
ClinVar has moved away from the term 'clinical significance' for a few reasons, here we move to using their new preferred term of 'germline classification'
d780598
to
1b3feb7
Compare
Resolves #1569
Screenshots of a local browser and API querying an indice in the production ES cluster that uses data from the output of this PR: