G statistic request #35

plunkert · 2024-10-16T17:28:27Z

Hello! I'm wondering if you would consider implementing the G statistic as in Magwene et al. 2011 (attached)? It's widely used on pool-seq data to find loci underlying trait variation in both laboratory crosses and natural accessions (e.g. Gould et al., 2017, attached). It also includes read depth information so as to account for variation in the uncertainty of allele frequency estimates across the genome. I'm not an expert in these statistics and I don't know if the pool-sequencing corrected Fst from grenedalf accomplishes the same thing.

To calculate G statistic for my current work, I've been doing variant calling with SNAPE-pooled and then running Billie Gould's script from her paper (https://bitbucket.org/billiegould/genomics_tools/src/master/SNAPEtools/G_calcSNAPE.py) and then using R subsetting to filter by read depth criteria. SNAPE-pooled isn't being maintained anymore and needs to run on one chromosome at a time, and I'd love to have fewer steps to string together when calculating G statistic.

Magwene 2011.pdf
Gould - Molecular Ecology - 2016 - Pooled ecotype sequencing reveals candidate genetic mechanisms for adaptive.pdf

Thanks so much for your time! Of course I completely understand if you can't prioritize the G statistic, but I appreciate your consideration. Please let me know if there's any other information I can provide.

Best,
Madison

lczech · 2024-10-21T12:25:10Z

Hi Madison,

Thanks for the suggestion, that indeed seems interesting and relevant! I had a look at the original manuscript you shared, as well as the python source, and it seems that this would fit well into grenedalf. However, we'd need to do a more thorough evaluation of this first - it is for instance unclear to me where in the program the pool size (or number of individuals in the bulk) are given to the script, which however seem to be needed for computing G (or G' - not sure which one it is you want from a first glance).

However, I recently changed positions, and am working mainly on other topics these days, so unfortunately, it is a bit hard for me to find the time and justification to work on this. If you or someone in your group (Lowry lab, if I see that correctly?) are willing to collaborate on figuring out the statistics and other questions that might pop up, I could help getting the code into grenedalf.

As for SNAPE-pooled: Is that program actually being used in practice? I knew of it, but it seems so dis-functional that I did not really think it is useful any more. I just had a look at it again and could not even run on its own example file (program did not terminate after waiting for a while, for a file with 6 positions). If their approach is being used though, it might be another addition to grenedalf that I would consider to add (at some point...).

Cheers and so long
Lucas

plunkert · 2024-10-21T12:56:35Z

Hi Lucas,

Billie implemented G and I've been using that, but it would be worth considering G'. I could ask the authors about it.

Billie's Gstat code doesn't take number of individuals in the pool as input; my understanding is that it's using solely the read coverage and the SNP calls from SNAPE-pooled, and the number of individuals in the pool is part of SNAPE-pooled input.

My lab uses SNAPE-pooled and got it to run after some hair-pulling, mainly motivated by the paper below which is quite recent and found that SNAPE-pooled performed better than other pool-seq variant callers. It does have 34 citations in the last 4 years, even with the challenges using it.

Molecular Ecology Resources - 2021 - Guirao‐Rico - Benchmarking the performance of Pool‐seq SNP callers using simulated and.pdf

Yes, I'm in the Lowry lab - let me discuss the collaboration idea with David and I'll get back to you over email.

Thanks!
Madison

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

G statistic request #35

G statistic request #35

plunkert commented Oct 16, 2024

lczech commented Oct 21, 2024

plunkert commented Oct 21, 2024

G statistic request #35

G statistic request #35

Comments

plunkert commented Oct 16, 2024

lczech commented Oct 21, 2024

plunkert commented Oct 21, 2024