-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify thinPoints to handle cases when extreme p-values exist #27
Comments
I like the idea of keeping all variants with p < p_threshold, but I think the threshold would need to be bigger than 5e-8 to avoid the weird "gaps" in the plots. I think we could actually use a value like 1e-4 and be OK. Under the null, the p-values are uniformly distributed on (0,1), so even with 40,000,000 variants we'd only expect about 4,000 to have p < 1e-4. Of course, it will be more if there is actual signal, so maybe we shouldn't go quite that high, but I don't think it should be genome-wide significance level. Another option would just be to increase |
I remember thinking about this for OLGA analyses. It looks like we used p = 1e-2 as a threshold there, which seems large to me. We could also think about a dynamic threshold, something like |
A dynamic approach seems reasonable to me. Based on your formula, if we assume 5e-8 comes from 1,000,000 variants, then the factor is 2000. Or we could just say 100/n_variants. With 1,000,000 variants the threshold would be 1e-4 Does this seem reasonable? Probably? This should always correspond to an expectation of 100 variants below this threshold under the null hypothesis. Of course more variants when there are signals, but that makes sense. We should probably test on a couple of examples. |
We encountered this thinPoints issue too. Have any steps been taken to resolve this issue? Instead of dynamic approach, I'd favor a controllable approach, where we can set our own -log10(p) threshold in the config file above which no thinning at all of the most significant results is done. That is, I want to be sure that the resulting plot shows us all signals above the chosen threshold. |
When there are variants with very extreme p-values, variants with more modest but still significant variants can get thinned. I saw this in an actual analysis, and here's an reproducible example showing what happens:
These variants are important, but they aren't selected because they fall into a bin with a lot of other variants with less significant p-values.
One suggestion for how to fix this is to modify
thinPoints
keep all variants with p < p_threshold, and then applying the thinning to only those with p > p_threshold. I'm not sure what value p_threshold should be, though -- 5e-8? 5e-9? bonferroni significance?The text was updated successfully, but these errors were encountered: