Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXME]: correct evaluation of GDF #7

Open
vkhodygo opened this issue Mar 29, 2021 · 0 comments
Open

[FIXME]: correct evaluation of GDF #7

vkhodygo opened this issue Mar 29, 2021 · 0 comments
Labels
enhancement New feature or request invalid This doesn't seem right performance question Further information is requested

Comments

@vkhodygo
Copy link
Owner

General commentaries on the theory
There is no consistent way to measure GDF as far as I'm aware, at least I couldn't find enough respectable references.
The way I saw and understood that initially was to get mean system density first (that is the number of particles per box).
Next step was to find deviations per cell, average over cells and time.

Apparently, this is not the case. The way it works in unbounded square domains is the following (see, for instance, this): we pick a square sub-volume of increasing linear size measure the time average of the number of particles inside and calculate its rms.

After some talks I was convinced that was the right choice, at least for non-equilibrium systems such as mine.
Different parts of confined domains have different (time) average values of density.
We can use the same approach that is square sub-domains, but for strongly anisotropic domain shapes that leaves us with a very limited number of points. In the case of narrow channels it is up to half-width (or width if we really need those points at the end of the scale).

My idea is to use regular cells of some suitable shape (squares and rectangles at the moment) to tile the whole domain.
That provides a lot of data points but - again - leaves us with a limited number of cell sizes and shapes.
The question is what to do next. The data itself looks correlated since particles leaving one cell immediately move into one of the adjacent.
Regardless of that, first step is to calculate time average for each cells, average over cells next?

  • check the code used in thesis to make sure what method I used.
  • what numerical approach should be used here: measure variance or std? averaging all variances and getting a square root of the result is not the same as getting square roots of all variances and averaging over them.
  • what about system average, is it a good idea to get a single value per cell? what about different systems with the same parameters, do we (weight) average at all (on a cell basis or on a system basis)?

Methods
This method is based on 2d- (3d-) histograms, but numpy and fast-histogram methods both have the same peculiar implementation: due to round-off errors some points in the rightmost column (and/or topmost row) can be marked as out-of-domain thus decreasing the overall number of particles within the system. That and a particular implementation (that should be the fastest one) leads to square roots of negative numbers.

Useful links
Check this, there is a chance I mix standard deviation and standard error.
https://stats.stackexchange.com/questions/222655/average-of-average-quantities-and-their-associated-error
https://stackoverflow.com/questions/19391149/numpy-mean-and-variance-from-single-function
https://ned.ipac.caltech.edu/level5/Leo/Stats4_5.html

@vkhodygo vkhodygo added enhancement New feature or request invalid This doesn't seem right question Further information is requested performance labels Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request invalid This doesn't seem right performance question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant