-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to interpret my results #2
Comments
Hello Maria, Glad you are using our method! I hope I can be of some help. The negative values I see are of the kinship (KING-robust kinship), am I correct? This is expected if the individuals are from different populations, and so not related. That said, to me it does look like there may be some bioinformatic issues going on here. In particular, it seems the R1 values are elevated. This could be because there are en excess of sites that are heterozygous across all samples. This can be caused by overmerged spots in the reference genome that attracts reads from more than one location. Have you applied mappability or heterozygosity filters? This is more of an issue for non-human data, but even for the human data in the paper we found that a mappability filter improved the estimates. Here is a paper that describes a program to compute mappabilty, and it can be applied to basically any reference. For a simple way to quantify relatives, you can consult table 1 of this paper, as we cite in the manuscript: hope this helps, please let me know if you have more questions best, |
Hi Ryan, Thanks for your reply. Wishes, |
Sure thing, hope I was able to help. I think my basic suggest for troubleshooting is to examine the 2D SFS for a pair of individuals and to see if it makes sense. And to also look for deviations that are shared across pairs of inds. See Figure 1 of the paper for an example / explanation of the 2d SFS. These values should be reported in the output. You can ignore values A and I, as these are not informative about relatedness here. Broadly, the E cell (heterozygous in both samples) is the cell in the 2D SFS that most strongly signals relatedness. This is because it shows that the site is variable, and also that the two individuals share two alleles at this site (the max possible identity). In some sense, all of these statistics (R0, R1, and kinship) are comparing the value of the E cell to other cells. This is why it is important to consider other sources of error in estimating the E cell, such alignment problems, miss-calibrated quality scores, etc. Popgen theory provides some expectations about how often the E cell should show up, relative to the others, for both related and unrelated pairs. Some of the values in your plots are pretty far away from these expectations, suggesting something is going on. For what it's worth , I'm not sure I would assume any data are perfect just because they are published. This analysis may be more sensitive to certain types of errors than previous analyses. Mappabilty and heterozygosity filters may be able to help this. Admixture and inbreeding can complicate things a bit, but I don't expect that to be the only issue here. best, |
Thanks for your detailed explanations! I will follow your suggestions. |
Hi,
I used your IBSrelate trying both the realSFS and the IBS methods following your guidelines here http://www.popgen.dk/software/index.php/IBSrelate. I just added these parameters to the ANGSD command in both cases
-minMapQ 30 -minQ 30 -trim 3
I obtained lots of negative values within my results as you can see in the plots
realSFS.plot.pdf.pdf
IBSrelate.plot.pdf.pdf
What is the meaning of these values?
Moreover, is there a more direct way to get prediction of relationships? Using the Rscript, R0, R1 and KING estimators may be reported, but no indication of related samples considering the joint ranges described in your paper.
Thanks for your attention.
Best wishes,
Maria Angela
The text was updated successfully, but these errors were encountered: