Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromosome X phasing #9

Open
dtaliun opened this issue Nov 16, 2023 · 10 comments
Open

Chromosome X phasing #9

dtaliun opened this issue Nov 16, 2023 · 10 comments

Comments

@dtaliun
Copy link

dtaliun commented Nov 16, 2023

Thank you very much for creating amazing resources!

Quick question: do you plan to release phased chromosome X any time soon?

Thanks again!

@z-koenig
Copy link
Collaborator

Yes, this is in the works! Our current timeline has a release coming before the end of December.

@dtaliun
Copy link
Author

dtaliun commented Apr 17, 2024

Hi,
I wanted to follow up and ask if you have any updates on chromosome X. Related question: I see that there is phased chromosome X data inside gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes/; what is the difference between phased_haplotypes and phased_haplotypes_v2?
Thanks again for the resource!

@LindoNkambule
Copy link
Collaborator

Hi @dtaliun , we are wrapping up phasing of chromosome X. I'll ping you here once it's available for download.

Regarding the difference, we had phased the first release without pedigree information and this affected phasing/imputation performance when compared to NYGC 1KG. Pedigree was incorporated in v2, which achieves better phasing/imputation compared to NYGC 1KG panel as we show in the manuscript. We also filtered out singletons in v2, you'll notice a drop in the number of variants compared to the first release.

@LindoNkambule
Copy link
Collaborator

Hi @dtaliun , the phased chromosome X files are now available for download and can be found here:
gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes_v2/hgdp1kgp_chrX*

@dtaliun
Copy link
Author

dtaliun commented Jun 4, 2024

Hi @LindoNkambule,

Thank you very much for sharing the phased chromosome X! I appreciate all your hard work.

I started to use the data, but I found a few thousand variants on the chrX non-PAR region where male samples were phased as heterozygous. Here are a few examples of heterozygous genotypes in male samples in the phased output:
HG00881 is 1|0 for chrX:2855315:C/T
HG02011,NA20866, and NA20903 are 1|0 for chrX:3443944:A/G

I believe this may be related to the bug in shapeit5, where shapeit5 doesn't set these het genotypes to missing correctly.

Could you please investigate on your end?

Thanks again!

@LindoNkambule
Copy link
Collaborator

Hi @dtaliun,

Thank you for raising this issue. I will look into it.

@dtaliun
Copy link
Author

dtaliun commented Jul 23, 2024

Hi @LindoNkambule,

I just wanted to follow up on the chromosome X phased files. Something is not right about them.

Unlike autosomal chromosome files, they also include monomorphic variants and singletons. Moreover, the hgdp1kgp_chrX_par1.shapeit5_common.bcf and hgdp1kgp_chrX_par2.shapeit5_common.bcf files have no variants with AC>=2:

bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par1.shapeit5_common.bcf | wc -l
# outputs 1 entry
bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par2.shapeit5_common.bcf | wc -l
# outputs 1 entry

@Ahhgust
Copy link

Ahhgust commented Sep 16, 2024

Sorry to jump in, but I too am looking for the X data. Like most folks, I'm interested in the non-par regions of the X, but what is posted are just rare variants, whereas for the autosomes you get rare+common variants.
I'm hoping that maybe the final X dataset just isn't posted? Or perhaps shapeit5 doesn't play nice w/ the X.
For my purposes, I'm just interested in making a panel for phasing/imputation, in which case maybe there are enough males where I can get enough "phased" haplotypes from the original VCF to make it work.

@LindoNkambule
Copy link
Collaborator

Hi everyone, the issues pointed have been addressed, I will post an update once the files have been made public.

One caveat, cc @dtaliun : to address the issue pointed out by @dtaliun , we decided to code the males as homozygous in non-PAR region. However, there were still a few variants where males were being phased as heterozygous (see issue). Since this was a small number (~7K), we decided to filter them out in the meantime while we wait for the SHAPEIT5 developers to look into the issue.

@dtaliun
Copy link
Author

dtaliun commented Sep 26, 2024

Hi @LindoNkambule,

Thank you very much for fixing it! And thank you again for sharing all this valuable data with us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants