-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pymatgen.io.openff
module
#3729
Conversation
Warning Rate Limit Exceeded@janosh has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 28 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThe updates in this version enhance the interoperability between Pymatgen and OpenFF, focusing on molecular analysis and integration improvements. These changes streamline molecular conversions, refine force field handling, optimize dependencies, and enhance testing for a smoother user experience in computational chemistry and materials science. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (2)
- pymatgen/io/openff.py (1 hunks)
- tests/io/test_openff.py (1 hunks)
Additional Context Used
Additional comments not posted (17)
tests/io/test_openff.py (10)
23-35
: Intest_molgraph_from_atom_bonds
, the test case is well-structured and covers the conversion from an OpenFF molecule to aMoleculeGraph
and back. However, it's recommended to add more assertions to verify the properties of the resulting molecules, such as checking the correctness of atom types, formal charges, and partial charges, to ensure a comprehensive test coverage.
38-67
: Intest_molgraph_from_openff_mol_cco
, the test case effectively checks the conversion from an OpenFF molecule to aMoleculeGraph
and validates the isomorphism between the original and converted graphs. Consider adding assertions to verify the correctness of atom types, formal charges, and partial charges in the convertedMoleculeGraph
for a more thorough test.
70-89
: Intest_molgraph_to_openff_pf6
, the assertionassert pf6_openff_1 == pf6_openff_2
might not be sufficient to ensure the equality of two OpenFF molecules, as it relies on the__eq__
method which might not deeply compare all molecule attributes. It's advisable to usetk.Molecule.is_isomorphic_with
with appropriate matching criteria to ensure a more robust comparison.
92-101
: Similar to the previous comment, intest_molgraph_to_openff_cco
, consider usingtk.Molecule.is_isomorphic_with
for comparingcco_openff_1
andcco_openff_2
to ensure a more comprehensive comparison that includes checking atom types, bond orders, and possibly stereochemistry.
104-117
:test_openff_back_and_forth
provides a good test for converting between OpenFF andMoleculeGraph
representations and back. However, it would be beneficial to add more assertions to check the properties of the resulting molecules, such as atom types, formal charges, and partial charges, to ensure the conversion process preserves these properties accurately.
120-135
: Intest_get_atom_map
, the test cases cover different molecules and their expected atom mappings. It's well-structured and seems comprehensive. However, consider adding a negative test case where no isomorphism is expected to ensure the function correctly handles such scenarios.
138-152
:test_infer_openff_mol
effectively tests the inference of OpenFF molecules from different geometries. The assertions on the number of atoms and bonds are appropriate. To enhance the test, consider adding assertions to verify the correctness of atom types and bond orders in the inferred OpenFF molecules.
155-160
: Intest_add_conformer
, the test case checks the addition of a conformer to an OpenFF molecule. It's recommended to add assertions to verify the coordinates of the added conformer match the expected values, ensuring the conformer is correctly added.
163-169
:test_assign_partial_charges
correctly tests the assignment of partial charges to an OpenFF molecule. To further improve the test, consider adding a case where partial charges are assigned using a charge method (e.g., "am1bcc") instead of directly providing the charges, to ensure the functionality works as expected in different scenarios.
172-180
:test_create_openff_mol
tests the creation of an OpenFF molecule from a SMILES string and optional geometry. It's advisable to add more assertions to verify the properties of the created molecule, such as checking if the partial charges are correctly scaled and if the molecule's stereochemistry is correctly inferred whenallow_undefined_stereo=True
.pymatgen/io/openff.py (7)
17-71
: Inmolgraph_to_openff_mol
, the function converts aMoleculeGraph
to an OpenFFMolecule
. It's well-documented and covers the conversion of atom and bond properties. However, consider handling cases where theMoleculeGraph
might contain multiple disconnected molecules, as the current implementation assumes a single molecule. This could be addressed by adding a check for disconnected components in the graph and raising an appropriate error or handling each component separately.
74-114
:molgraph_from_openff_mol
effectively converts an OpenFFMolecule
to aMoleculeGraph
. The function mirrors the structure generated bytk.Molecule.to_networkx
. To enhance this function, consider adding support for handling multiple conformers if present in the OpenFF molecule, as the current implementation only uses the first conformer. This could involve iterating over all conformers and creating a separateMoleculeGraph
for each.
117-156
:get_atom_map
computes an atom mapping between two OpenFFMolecules
. The function attempts to find an isomorphism with varying levels of restrictions. It's well-implemented, but consider adding logging or warnings when the function needs to relax restrictions (e.g., stereochemistry, bond order) to find an isomorphism. This would provide users with more insight into the matching process and potential discrepancies between the molecules.
159-180
: Ininfer_openff_mol
, the function infers an OpenFFMolecule
from apymatgen.core.Molecule
. The use ofOpenBabelNN
andmetal_edge_extender
for constructing theMoleculeGraph
is a good approach. However, consider adding error handling for cases where the inference process fails due to unsupported elements or configurations in the input molecule, ensuring the function fails gracefully with informative error messages.
183-221
:add_conformer
adds conformers to an OpenFFMolecule
based on provided geometry. The function includes a TODO comment about testing, which indicates that it might not have been thoroughly tested. It's crucial to add unit tests for this function to ensure its correctness, especially for edge cases such as non-isomorphic geometries or when no geometry is provided.
224-264
:assign_partial_charges
assigns partial charges to an OpenFFMolecule
. The function handles different scenarios, including provided partial charges and using a charge method. To improve this function, consider adding validation for thecharge_method
parameter to ensure it's supported by OpenFF toolkit, and provide clear error messages if an unsupported method is specified.
267-318
:create_openff_mol
creates an OpenFFMolecule
from a SMILES string and optional geometry. The function is comprehensive and handles various scenarios, including charge scaling and partial charge assignment. However, the error handling for mismatched lengths betweenpartial_charges
andgeometry
could be improved by specifying which molecule or atom caused the mismatch, making it easier for users to debug issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, excited to see this! 👍
is the issue here that openff
is only installable with conda
/mamba
?
we could try installing from source in CI
Sounds good! I'll try adding that to the CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (8)
- .github/workflows/test.yml (1 hunks)
- pymatgen/io/openff.py (1 hunks)
- tests/files/classical_md_mols/CCO.xyz (1 hunks)
- tests/files/classical_md_mols/FEC-r.xyz (1 hunks)
- tests/files/classical_md_mols/FEC-s.xyz (1 hunks)
- tests/files/classical_md_mols/Li.xyz (1 hunks)
- tests/files/classical_md_mols/PF6.xyz (1 hunks)
- tests/io/test_openff.py (1 hunks)
Files skipped from review due to trivial changes (1)
- tests/files/classical_md_mols/Li.xyz
Files skipped from review as they are similar to previous changes (1)
- tests/io/test_openff.py
Additional comments not posted (4)
tests/files/classical_md_mols/PF6.xyz (1)
1-9
: The XYZ format is correctly followed, and the file appears relevant to the PR's objectives.tests/files/classical_md_mols/CCO.xyz (1)
1-11
: The XYZ format is correctly followed, and the file appears relevant to the PR's objectives.tests/files/classical_md_mols/FEC-r.xyz (1)
1-12
: The XYZ format is correctly followed, and the file appears relevant to the PR's objectives.tests/files/classical_md_mols/FEC-s.xyz (1)
1-12
: The XYZ format is correctly followed, and the file appears relevant to the PR's objectives.
.github/workflows/test.yml
Outdated
@@ -111,6 +111,8 @@ jobs: | |||
# TODO remove next line installing ase from main branch when FrechetCellFilter is released | |||
uv pip install --upgrade 'git+https://gitlab.com/ase/ase' --system | |||
|
|||
uv pip install git+https://github.com/openforcefield/openff-toolkit.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pinning the openff-toolkit
dependency to a specific commit or release tag to ensure stability in your CI/CD pipeline.
- uv pip install git+https://github.com/openforcefield/openff-toolkit.git
+ uv pip install git+https://github.com/openforcefield/openff-toolkit.git@<commit_or_tag>
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
uv pip install git+https://github.com/openforcefield/openff-toolkit.git | |
uv pip install git+https://github.com/openforcefield/openff-toolkit.git@<commit_or_tag> |
if geometry is None: | ||
raise ValueError("geometries must be set if partial_charges is set") | ||
if len(partial_charges) != len(geometry): | ||
raise ValueError("partial charges must have same length & order as geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate the length of partial_charges
against the number of atoms in geometry
instead of the length of geometry
.
- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if geometry is None: | |
raise ValueError("geometries must be set if partial_charges is set") | |
if len(partial_charges) != len(geometry): | |
raise ValueError("partial charges must have same length & order as geometry") | |
if geometry is None: | |
raise ValueError("geometries must be set if partial_charges is set") | |
if len(partial_charges) != geometry.num_sites: | |
raise ValueError("partial charges must have same length & order as geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (2)
- .github/workflows/test.yml (1 hunks)
- pymatgen/io/openff.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/test.yml
if partial_charges is not None: | ||
if geometry is None: | ||
raise ValueError("geometries must be set if partial_charges is set") | ||
if len(partial_charges) != len(geometry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate the length of partial_charges
against the number of atoms in geometry
instead of the length of geometry
.
- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if len(partial_charges) != len(geometry): | |
if len(partial_charges) != geometry.num_sites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (3)
- .github/workflows/test.yml (1 hunks)
- pymatgen/io/openff.py (1 hunks)
- requirements-optional.txt (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/test.yml
Additional comments not posted (5)
requirements-optional.txt (1)
15-15
: LGTM! The addition ofopenbabel>=3.1.1
aligns with the PR's objectives and enhances integration capabilities.pymatgen/io/openff.py (4)
40-40
: The TODO comment about asserting the presence of only one molecule is addressed. Ensure the assertion is implemented as suggested in the previous review.
204-204
: The TODO comment about testing theadd_conformer
function is addressed. Ensure unit tests are added to cover this functionality.
247-247
: The TODO comment about testing theassign_partial_charges
function is addressed. Ensure unit tests are added to cover this functionality.
296-296
: The suggestion to validate the length ofpartial_charges
against the number of atoms ingeometry
instead of the length ofgeometry
is addressed. Ensure this change is implemented as suggested.
I switched over the test suite to use micromamba and the speed is ~20-30% faster on ubuntu and comparable on windows. The test file is also significantly shorter because I switched By adding openbabel I discovered several failing tests that were flying under the radar and that are now causing CI to fail.
@janosh, what do you recommend as the path forward here? Should we: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (2)
- .github/workflows/test.yml (1 hunks)
- pymatgen/io/babel.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- .github/workflows/test.yml
Additional comments not posted (2)
pymatgen/io/babel.py (2)
185-192
: Consider enhancing the warning message to include the list of supported forcefields for user convenience.
350-350
: The addition of thefrom_str
method enhances the module's functionality by providing a convenient way to create molecules from string data.
ff = openbabel.OBForceField.FindType(forcefield) | ||
if ff == 0: | ||
print(f"Could not find {forcefield=} in openbabel, the forcefield will be reset as default 'mmff94'") | ||
ff = openbabel.OBForceField_FindType("mmff94") | ||
ff = openbabel.OBForceField.FindType("mmff94") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the print statement with a warning to maintain consistency and provide users with better control over error messages.
- print(f"Could not find {forcefield=} in openbabel, the forcefield will be reset as default 'mmff94'")
+ warnings.warn(f"Could not find {forcefield=} in openbabel, the forcefield will be reset as default 'mmff94'")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
ff = openbabel.OBForceField.FindType(forcefield) | |
if ff == 0: | |
print(f"Could not find {forcefield=} in openbabel, the forcefield will be reset as default 'mmff94'") | |
ff = openbabel.OBForceField_FindType("mmff94") | |
ff = openbabel.OBForceField.FindType("mmff94") | |
ff = openbabel.OBForceField.FindType(forcefield) | |
if ff == 0: | |
warnings.warn(f"Could not find {forcefield=} in openbabel, the forcefield will be reset as default 'mmff94'") | |
ff = openbabel.OBForceField.FindType("mmff94") |
…ng inappropriate strategy for molecules
… fixes TestQCOutput.test_all
@orionarcher luckily fixing the 3 failing tests was fairly easy. fixing the third surfaced a 4 issue so there's still 1 failing test
i think we can skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
|
||
# set atom properties | ||
partial_charges = [] | ||
# TODO: should assert that there is only one molecule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing the suggested assertion to ensure molgraph_to_openff_mol
handles only one molecule.
+ assert len(molgraph.molecule) == 1, "The MoleculeGraph should contain exactly one molecule."
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
# TODO: should assert that there is only one molecule | |
# TODO: should assert that there is only one molecule | |
assert len(molgraph.molecule) == 1, "The MoleculeGraph should contain exactly one molecule." |
Molecule with added conformers and a dictionary representing the atom | ||
mapping. | ||
""" | ||
# TODO: test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure unit tests for add_conformer
are added to verify its functionality and handle edge cases.
Would you like me to help by writing some unit tests for this function?
Returns: | ||
tk.Molecule: The OpenFF Molecule with assigned partial charges. | ||
""" | ||
# TODO: test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to add unit tests for assign_partial_charges
to ensure its correctness and robustness.
Would you like assistance in creating unit tests for this function?
if len(partial_charges) != len(geometry): | ||
raise ValueError("partial charges must have same length & order as geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correctly validate the length of partial_charges
against the number of atoms in geometry
.
- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if len(partial_charges) != len(geometry): | |
raise ValueError("partial charges must have same length & order as geometry") | |
if len(partial_charges) != geometry.num_sites: | |
raise ValueError("partial charges must have same length & order as geometry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
tar xvzf bader_lnx_64.tar.gz | ||
sudo mv bader /usr/local/bin/ | ||
continue-on-error: true # This is not critical to succeed. | ||
micromamba install -n pmg -c conda-forge enumlib packmol bader openbabel openff-toolkit --yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pinning the versions of enumlib
, packmol
, bader
, openbabel
, and openff-toolkit
to ensure reproducibility and stability in your CI/CD pipeline.
- micromamba install -n pmg -c conda-forge enumlib packmol bader openbabel openff-toolkit --yes
+ micromamba install -n pmg -c conda-forge enumlib=<version> packmol=<version> bader=<version> openbabel=<version> openff-toolkit=<version> --yes
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
micromamba install -n pmg -c conda-forge enumlib packmol bader openbabel openff-toolkit --yes | |
micromamba install -n pmg -c conda-forge enumlib=<version> packmol=<version> bader=<version> openbabel=<version> openff-toolkit=<version> --yes |
…cls=MontyEncoder)
remove unused self.maxDiff = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @orionarcher. excited about the new OpenFF integration!
and really appreciate that you reinstated the openbabel
tests!
Summary
This PR adds basic integration with the
openff.toolkit
Molecule representation. It is intended to be use in theclassical_md
workflows inatomate2
andemmet
.Currently, this functionality is downstream in a PR to atomate2. As basic IO functionality, I would like to migrate it upstream to
pymatgen
. Especially because some functions will also be used in theclassical_md
emmet
Builder
.Major changes:
pymatgen.Molecule
/pymatgen.MoleculeGraph
<->openff.toolkit.Molecule
Checklist
ruff
.mypy
.duecredit
@due.dcite
decorators to reference relevant papers by DOI (example)Tip: Install
pre-commit
hooks to auto-check types and linting before every commit:Summary by CodeRabbit
openbabel>=3.1.1
to optional requirements, broadening the toolkit's compatibility.