-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics abstraction pattern #74
Conversation
from openqdc.utils.exceptions import StatisticsNotAvailableError | ||
|
||
|
||
class DatasetPropertyMixIn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less stuff in base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with statistics manager and descriptor calculation, the base is already reduced in size, I'm unsure if we should create a new class just for these 4-5 straightforward methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point but in this case we can just have a better granularity on what type of property we wish to add and keep it a bit more clean.
We also can just update a file outside of the base.py class to add new properties so further PR will be easier to solve as it is not a really important python file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I would still be more likely to add methods to the base class than this one. But feel free to close this thread if you think this is valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the class but agree with @shenoynikhil . It is not needed but it makes it easier to navigate the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on keeping the class in . I just used it to remove a bit of clutter from the base class. If @shenoynikhil or @prtos want to have the properties in the main class I'm fine removing the mixin and reimplementing them into the base class.
I see it being useful because we can further separate properties between the baseclass for the potential energy datasets from the interaction datasets and allows us to implement new properties without touching the base.py file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add docstrings, usage instructions for StatisticsManager, Descriptors, etc?
In some instances, you can use better variable names like deps
in StatisticsManager is a bit confusing.
Also, tests please.
Adding tests and then I 'm probably done |
openqdc/datasets/energies.py
Outdated
""" | ||
|
||
def _post_init(self): | ||
self._e0_matrixs = [IsolatedAtomEnergyFactory.get_matrix(en_method) for en_method in self.data.energy_methods] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be updated now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be done while solving the merge issues
from openqdc.utils.exceptions import StatisticsNotAvailableError | ||
|
||
|
||
class DatasetPropertyMixIn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the class but agree with @shenoynikhil . It is not needed but it makes it easier to navigate the codebase
force_mean = np.nanmean(converted_force_data, axis=0) | ||
force_std = np.nanstd(converted_force_data, axis=0) | ||
force_rms = np.sqrt(np.nanmean(converted_force_data**2, axis=0)) | ||
return ForceStatistics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused with this. the component part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On our multitask losses we need to have some informations about the rms of the forces in the dataset on the x.y,z components of the force vectors.
Checklist:
Abstract the regression for linear atom energies
Clean code in base
Add mixin class to extend properties in base class
Abstract energy statistics to state pattern + interfaces
Automatic saving loading of the right files for statistics
Add formation energy and per atom formation energy to the getitem object return