-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
masking when averaging by window-length #24
Comments
Hi Martin, thanks for your patience, finally circling back to working on grenedalf. To make sure that I am understanding your suggestion correctly: Do you want to be able to provide a mask file that determines the denominator of each window? Such as: In a given window of length Would that be a solution to what you are thinking? If not, I am not understanding what exactly you would want to compute there - in that case, could you please clarify with an example? Cheers |
Hi Martin @capoony, I just released grenedalf v0.6.0 which implements all of the above features. Let me know if this works for you, or if this does not solve your use case :-) Cheers |
Hey Martin, just to also answer your comments in a bit more detail. I think I might have been a bit quick to close this issue - reopening now, so that we can discuss this.
Yes, the missing positions are expected there, because when specifying a mask, we assume that this is to be taken as ground truth, and so in order to apply that, we need to fill in all positions for which there is no input as an empty missing position, to which then the mask can be applied. The alternative would be to keep the input without filling in the missing positions, but then the mask might specify positions as unmasked that are not there, and so they would still be ignored, despite the mask telling us that we should not. Does that make sense? I could also deactivate that the missing positions are filled in when a mask is specified, and instead leave that to the user to decide (there is an option
Interesting - you should see missing positions there as well, I think. Can you maybe test with the latest v0.6.0 again, and see if that changed? If not, can you maybe provide a minimal example for me to check? As for the window average policy, I think that the Cheers |
Hi Lucas,
thanks for the great new features -praticulary the averaging options are very useful. However, I am wondering if you could also implement the possibility to mask when in averaging by window-length. As far as I understand this is not possible yet.
In our case, we have already called SNPs and want to average diversity stats in windows (by window-length) based on a SYNC file with SNPs only. In addition, we have BED files, which contain the regions that should be masked in the genomes.
When I am using
--window-average-policy valid-loci
in combination with--filter-mask-fasta ${PWD}/data/BED/${ID}.mask.gz
I get a lot of missing positions, but (I guess) the correct number of masked sites per window.When I am using
--window-average-policy window-length
in combination with--filter-mask-bed ${PWD}/data/BED/${ID}.bed.gz
I get no (or a few??) missing positions, but also no masked sites per window.Thanks, Martin
The text was updated successfully, but these errors were encountered: