Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to interpret my results #2

Open
ma-diroma opened this issue Jun 17, 2020 · 4 comments
Open

how to interpret my results #2

ma-diroma opened this issue Jun 17, 2020 · 4 comments

Comments

@ma-diroma
Copy link

Hi,

I used your IBSrelate trying both the realSFS and the IBS methods following your guidelines here http://www.popgen.dk/software/index.php/IBSrelate. I just added these parameters to the ANGSD command in both cases
-minMapQ 30 -minQ 30 -trim 3
I obtained lots of negative values within my results as you can see in the plots
realSFS.plot.pdf.pdf
IBSrelate.plot.pdf.pdf
What is the meaning of these values?
Moreover, is there a more direct way to get prediction of relationships? Using the Rscript, R0, R1 and KING estimators may be reported, but no indication of related samples considering the joint ranges described in your paper.
Thanks for your attention.
Best wishes,
Maria Angela

@rwaples
Copy link
Owner

rwaples commented Jun 17, 2020

Hello Maria,

Glad you are using our method! I hope I can be of some help.

The negative values I see are of the kinship (KING-robust kinship), am I correct? This is expected if the individuals are from different populations, and so not related.

That said, to me it does look like there may be some bioinformatic issues going on here. In particular, it seems the R1 values are elevated. This could be because there are en excess of sites that are heterozygous across all samples. This can be caused by overmerged spots in the reference genome that attracts reads from more than one location. Have you applied mappability or heterozygosity filters?

This is more of an issue for non-human data, but even for the human data in the paper we found that a mappability filter improved the estimates. Here is a paper that describes a program to compute mappabilty, and it can be applied to basically any reference.

For a simple way to quantify relatives, you can consult table 1 of this paper, as we cite in the manuscript:
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-2873

hope this helps, please let me know if you have more questions

best,
Ryan

@ma-diroma
Copy link
Author

Hi Ryan,

Thanks for your reply.
Yes, the negative values are of KING-robust kinship. Some R1 values are very high: the highest R1 by realSFS is 4.96, 0.73 by IBS relate method. My data derive from human ancient DNA, which generally may suffer from contamination and degradation. I used already published data, so I am quite sure of their good quality. However, thanks for your suggestion, I will discuss about it with the authors.

Wishes,
Maria Angela

@rwaples
Copy link
Owner

rwaples commented Jun 18, 2020

Sure thing, hope I was able to help.

I think my basic suggest for troubleshooting is to examine the 2D SFS for a pair of individuals and to see if it makes sense. And to also look for deviations that are shared across pairs of inds. See Figure 1 of the paper for an example / explanation of the 2d SFS. These values should be reported in the output.

You can ignore values A and I, as these are not informative about relatedness here. Broadly, the E cell (heterozygous in both samples) is the cell in the 2D SFS that most strongly signals relatedness. This is because it shows that the site is variable, and also that the two individuals share two alleles at this site (the max possible identity). In some sense, all of these statistics (R0, R1, and kinship) are comparing the value of the E cell to other cells. This is why it is important to consider other sources of error in estimating the E cell, such alignment problems, miss-calibrated quality scores, etc.

Popgen theory provides some expectations about how often the E cell should show up, relative to the others, for both related and unrelated pairs. Some of the values in your plots are pretty far away from these expectations, suggesting something is going on. For what it's worth , I'm not sure I would assume any data are perfect just because they are published. This analysis may be more sensitive to certain types of errors than previous analyses. Mappabilty and heterozygosity filters may be able to help this. Admixture and inbreeding can complicate things a bit, but I don't expect that to be the only issue here.

best,
ryan

@ma-diroma
Copy link
Author

Thanks for your detailed explanations! I will follow your suggestions.
Best,
Maria Angela

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants