Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: q.VgB should be np/(np-1) q.VgB #27

Open
jgx65 opened this issue May 13, 2020 · 8 comments
Open

statistics: q.VgB should be np/(np-1) q.VgB #27

jgx65 opened this issue May 13, 2020 · 8 comments

Comments

@jgx65
Copy link
Owner

jgx65 commented May 13, 2020

The between patch genetic variance q.VgB necessary to estimate QST is underestimated by a factor np/(np-1), where np is the number of patches.

@frederic-michaud
Copy link
Collaborator

Hi @jgx65 ,

I wanted to have a quick look at this issue before shipping the new version to see if we could include a fix.

It is likely that we could just change this line:

return ARRAY::var(_meanP[2], get_current_nbSamplePatch());

But then the question is how many times do we do the same mistake. I guess at least we have the same problem 10 lines above?!

return ARRAY::var(_meanG[2], get_current_nbSamplePatch());

And the function var is also used in function to compute stat within patch:

varG = ARRAY::var(array, size, meanG);

(this happen 7 times for various similar quantities. Just search for var( )

Finnaly, we have two cases (not sure what we are computing here, but this is probably more clear to you) where we use it for an array of size 2:

Vb = ARRAY::var(array, 2); // var of means

Here, we have a factor 2 in the variance if I'm not mistaken.

In principle, we should check all these statistics, see if we observe the systematic bias, design test to check it, and then update the code to see if the bias disapear. I don't really know how easy/difficult this would be and who would do it.

How do you want to move forward with this?

Cheers,
Frédéric

@jgx65
Copy link
Owner Author

jgx65 commented Jun 10, 2020

@frederic-michaud I had a look, I believe in all cases the n/(n-1) factor should be included. And you are right, in the last case, it is a factor of 2 since n=2! The only situation where the n/(n-1) multiplying factor should not be used is if the mean is derived from an independent set of data, but here, I don't think it is the case. @sneuensc, would you agree?

@sneuensc
Copy link
Collaborator

sneuensc commented Jun 10, 2020 via email

@frederic-michaud
Copy link
Collaborator

Ok, so I update the code and push? Will you have time to quickly check some statistics?

@jgx65
Copy link
Owner Author

jgx65 commented Jun 10, 2020

thanks @frederic-michaud, I'll try to do some checking.

@jgx65
Copy link
Owner Author

jgx65 commented Jun 10, 2020

Hi Jerome, That is fine for me. Personally I don’t mind which statistic is computed, but it has to be stated in the manual. Cheers, Sam

I checked the manual, we always say 'variance' without specifying which formula we are using

@sneuensc
Copy link
Collaborator

sneuensc commented Jun 10, 2020 via email

@jgx65
Copy link
Owner Author

jgx65 commented Jun 23, 2020

Thanks @frederic-michaud . It seems to work from the little checking I have made, results are in lines with what I expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants