Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics abstraction pattern #74

Merged
merged 33 commits into from
Apr 4, 2024
Merged

Statistics abstraction pattern #74

merged 33 commits into from
Apr 4, 2024

Conversation

FNTwin
Copy link
Collaborator

@FNTwin FNTwin commented Apr 2, 2024

Checklist:

  • Was this PR discussed in a issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation is a new function is added or an existing one is deleted.

Abstract the regression for linear atom energies
Clean code in base
Add mixin class to extend properties in base class
Abstract energy statistics to state pattern + interfaces
Automatic saving loading of the right files for statistics
Add formation energy and per atom formation energy to the getitem object return

from openqdc.utils.exceptions import StatisticsNotAvailableError


class DatasetPropertyMixIn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of this class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less stuff in base

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with statistics manager and descriptor calculation, the base is already reduced in size, I'm unsure if we should create a new class just for these 4-5 straightforward methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point but in this case we can just have a better granularity on what type of property we wish to add and keep it a bit more clean.
We also can just update a file outside of the base.py class to add new properties so further PR will be easier to solve as it is not a really important python file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I would still be more likely to add methods to the base class than this one. But feel free to close this thread if you think this is valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the class but agree with @shenoynikhil . It is not needed but it makes it easier to navigate the codebase

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on keeping the class in . I just used it to remove a bit of clutter from the base class. If @shenoynikhil or @prtos want to have the properties in the main class I'm fine removing the mixin and reimplementing them into the base class.
I see it being useful because we can further separate properties between the baseclass for the potential energy datasets from the interaction datasets and allows us to implement new properties without touching the base.py file

Copy link
Collaborator

@shenoynikhil shenoynikhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add docstrings, usage instructions for StatisticsManager, Descriptors, etc?

In some instances, you can use better variable names like deps in StatisticsManager is a bit confusing.

Also, tests please.

@FNTwin
Copy link
Collaborator Author

FNTwin commented Apr 2, 2024

Adding tests and then I 'm probably done

@FNTwin FNTwin linked an issue Apr 2, 2024 that may be closed by this pull request
@FNTwin FNTwin changed the base branch from develop to release April 3, 2024 14:09
openqdc/datasets/base.py Outdated Show resolved Hide resolved
openqdc/datasets/base.py Outdated Show resolved Hide resolved
openqdc/datasets/base.py Outdated Show resolved Hide resolved
openqdc/datasets/energies.py Outdated Show resolved Hide resolved
"""

def _post_init(self):
self._e0_matrixs = [IsolatedAtomEnergyFactory.get_matrix(en_method) for en_method in self.data.energy_methods]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be updated now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done while solving the merge issues

from openqdc.utils.exceptions import StatisticsNotAvailableError


class DatasetPropertyMixIn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the class but agree with @shenoynikhil . It is not needed but it makes it easier to navigate the codebase

force_mean = np.nanmean(converted_force_data, axis=0)
force_std = np.nanstd(converted_force_data, axis=0)
force_rms = np.sqrt(np.nanmean(converted_force_data**2, axis=0))
return ForceStatistics(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused with this. the component part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our multitask losses we need to have some informations about the rms of the forces in the dataset on the x.y,z components of the force vectors.

@FNTwin FNTwin merged commit 247b0e1 into release Apr 4, 2024
5 checks passed
@FNTwin FNTwin deleted the pattern branch April 4, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalized Statistics Calculation
3 participants