Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pymatgen.io.openff module #3729

Merged
merged 55 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f52a8df
Add functionality and tests for pymatgen.io.openff.
orionarcher Mar 31, 2024
d360549
Add test files.
orionarcher Apr 4, 2024
d1a9ace
Update testing fixture with correct test file paths.
orionarcher Apr 4, 2024
7328391
Update docstrings to google format.
orionarcher Apr 4, 2024
b9b36b9
Add openff install from source to test.yml
orionarcher Apr 4, 2024
c173c71
Lint with black.
orionarcher Apr 4, 2024
8f5090f
Small formatting change.
orionarcher Apr 4, 2024
2902ea4
Try changing test.yml
orionarcher Apr 4, 2024
d1b7d43
Add manual install of units.
orionarcher Apr 4, 2024
1a2afe4
Add import warning.
orionarcher Apr 4, 2024
8660c1b
Try adding install for openff-utilities.
orionarcher Apr 4, 2024
8eb6b4c
Add openbabel install.
orionarcher Apr 4, 2024
e08161f
Add openbabel dependency to setup.py
orionarcher Apr 4, 2024
09f1e34
Change openbabel version
orionarcher Apr 4, 2024
de88ef8
fix babel dep.
orionarcher Apr 4, 2024
8a7d9c0
Try building babel from source.
orionarcher Apr 5, 2024
8b5528b
Remove openbabel from setup.py and requirements-optional.txt.
orionarcher Apr 5, 2024
f95e2d5
Attempt new testing build with micromamba.
orionarcher Apr 8, 2024
305945c
Remove bader and enumlib
orionarcher Apr 8, 2024
8cf2cc4
Update testing file
orionarcher Apr 8, 2024
1a5ebf2
Try chaning testing to install pytest.
orionarcher Apr 8, 2024
25f44e9
Run testing with micromamba.
orionarcher Apr 8, 2024
d937bca
Fix minor typo.
orionarcher Apr 8, 2024
dc06e5d
Try switching to editable install.
orionarcher Apr 8, 2024
186e365
Try changing TEST_FILES_DIR
orionarcher Apr 8, 2024
b243e2a
Undo previous change
orionarcher Apr 8, 2024
3b88415
Add manual builds of packmol, enumlib, and bader.
orionarcher Apr 8, 2024
0b7b6eb
Switch to openff-toolkit-base
orionarcher Apr 8, 2024
b200a78
Convert manual builds of enumlib, packmol, bader, and openbabel to co…
orionarcher Apr 8, 2024
49bb35f
Try limiting windows openbabel install.
orionarcher Apr 8, 2024
859e83d
Skip openff tests if openff-toolkit not installed
orionarcher Apr 8, 2024
fe1a85a
Try adding uv and skipping tests.
orionarcher Apr 8, 2024
45810d8
Fix typo
orionarcher Apr 8, 2024
25dbd7e
Add non-failing optional import.
orionarcher Apr 8, 2024
040e701
Add importorskip call.
orionarcher Apr 8, 2024
a9bad6b
Fix import or skip and reimport bson.
orionarcher Apr 8, 2024
cf2df32
Reinstall pymongo in CI to fix bson error. Remove commented out manua…
orionarcher Apr 8, 2024
930c801
Fix errors in babel.py
orionarcher Apr 8, 2024
4802525
Remove unneeded step.
orionarcher Apr 8, 2024
ba19c29
fix TestMoleculeGraph.test_construction expected error message on usi…
janosh Apr 9, 2024
7c26ddf
swap assert_array_equal for assert_allclose in test_outputs.py, maybe…
janosh Apr 9, 2024
e07eb56
cast coords to float in BabelMolAdaptor ob_atom.SetVector
janosh Apr 9, 2024
39fe79e
Refactor assert statements in test_outputs.py to use approx() for dicts
janosh Apr 9, 2024
347008e
Coerce formal charge to int in molgraph_to_openff_mol
orionarcher Apr 9, 2024
220af2f
rename micromamba to pmg
janosh Apr 10, 2024
005f610
micromamba activate pmg
janosh Apr 10, 2024
93406b8
remove pymongo install, should not be needed?
janosh Apr 10, 2024
c837d78
micromamba activate pmg in pytest step
janosh Apr 10, 2024
7389f30
set shell: bash -l {0} for mamba env activate
janosh Apr 10, 2024
db4e324
replace hard-to-install bson.BSON (93406b8) with pymatgen-native asse…
janosh Apr 10, 2024
30be59a
replace isinstance with issubclass in assert_msonable
janosh Apr 10, 2024
e023eec
assert_msonable replace obj.to_json() with json.dumps(obj.as_dict(), …
janosh Apr 10, 2024
1410a2e
skip failing TestQCOutput.test_all
janosh Apr 10, 2024
a3e9c17
rename mol(''->_)graph_to_openff_mol
janosh Apr 10, 2024
780394d
missed some
janosh Apr 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ jobs:
# TODO remove next line installing ase from main branch when FrechetCellFilter is released
uv pip install --upgrade 'git+https://gitlab.com/ase/ase' --system

uv pip install git+https://github.com/openforcefield/openff-toolkit.git
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pinning the openff-toolkit dependency to a specific commit or release tag to ensure stability in your CI/CD pipeline.

- uv pip install git+https://github.com/openforcefield/openff-toolkit.git
+ uv pip install git+https://github.com/openforcefield/openff-toolkit.git@<commit_or_tag>

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
uv pip install git+https://github.com/openforcefield/openff-toolkit.git
uv pip install git+https://github.com/openforcefield/openff-toolkit.git@<commit_or_tag>


- name: pytest split ${{ matrix.split }}
run: |
pytest --splits 10 --group ${{ matrix.split }} --durations-path tests/files/.pytest-split-durations tests
294 changes: 294 additions & 0 deletions pymatgen/io/openff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
"""Utility functions for classical md subpackage."""

from __future__ import annotations

from pathlib import Path

import numpy as np
import openff.toolkit as tk
from openff.units import Quantity, unit

import pymatgen
from pymatgen.analysis.graphs import MoleculeGraph
from pymatgen.analysis.local_env import OpenBabelNN, metal_edge_extender
from pymatgen.core import Element, Molecule


def molgraph_to_openff_mol(molgraph: MoleculeGraph) -> tk.Molecule:
"""Convert a Pymatgen MoleculeGraph to an OpenFF Molecule.

Args:
molgraph (MoleculeGraph): The Pymatgen MoleculeGraph to be converted.

Returns:
tk.Molecule: The converted OpenFF Molecule.
"""
# create empty openff_mol and prepare a periodic table
p_table = {str(el): el.Z for el in Element}
openff_mol = tk.Molecule()

# set atom properties
partial_charges = []
# TODO: should assert that there is only one molecule
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing the suggested assertion to ensure molgraph_to_openff_mol handles only one molecule.

+ assert len(molgraph.molecule) == 1, "The MoleculeGraph should contain exactly one molecule."

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
# TODO: should assert that there is only one molecule
# TODO: should assert that there is only one molecule
assert len(molgraph.molecule) == 1, "The MoleculeGraph should contain exactly one molecule."

for i_node in range(len(molgraph.graph.nodes)):
node = molgraph.graph.nodes[i_node]
atomic_number = node.get("atomic_number") or p_table[molgraph.molecule[i_node].species_string]

# put formal charge on first atom if there is none present
formal_charge = node.get("formal_charge")
if formal_charge is None:
formal_charge = (i_node == 0) * molgraph.molecule.charge * unit.elementary_charge

# assume not aromatic if no info present
is_aromatic = node.get("is_aromatic") or False

openff_mol.add_atom(atomic_number, formal_charge, is_aromatic=is_aromatic)

# add to partial charge array
partial_charge = node.get("partial_charge")
if isinstance(partial_charge, Quantity):
partial_charge = partial_charge.magnitude
partial_charges.append(partial_charge)

charge_array = np.array(partial_charges)
if np.not_equal(charge_array, None).all():
openff_mol.partial_charges = charge_array * unit.elementary_charge

# set edge properties, default to single bond and assume not aromatic
for i_node, j, bond_data in molgraph.graph.edges(data=True):
bond_order = bond_data.get("bond_order") or 1
is_aromatic = bond_data.get("is_aromatic") or False
openff_mol.add_bond(i_node, j, bond_order, is_aromatic=is_aromatic)

openff_mol.add_conformer(molgraph.molecule.cart_coords * unit.angstrom)
return openff_mol


def molgraph_from_openff_mol(molecule: tk.Molecule):
"""
This is designed to closely mirror the graph structure generated by tk.Molecule.to_networkx

Args:
molecule (tk.Molecule): The OpenFF Molecule to convert.

Returns:
MoleculeGraph: The converted MoleculeGraph.
"""
molgraph = MoleculeGraph.with_empty_graph(
Molecule([], []),
name="none",
)
p_table = {el.Z: str(el) for el in Element}
total_charge = 0
cum_atoms = 0

coords = molecule.conformers[0].magnitude if molecule.conformers is not None else np.zeros((molecule.n_atoms, 3))
for j, atom in enumerate(molecule.atoms):
molgraph.insert_node(
cum_atoms + j,
p_table[atom.atomic_number],
coords[j, :],
)
molgraph.graph.nodes[cum_atoms + j]["atomic_number"] = atom.atomic_number
molgraph.graph.nodes[cum_atoms + j]["is_aromatic"] = atom.is_aromatic
molgraph.graph.nodes[cum_atoms + j]["stereochemistry"] = atom.stereochemistry
# set partial charge as a pure float
partial_charge = None if atom.partial_charge is None else atom.partial_charge.magnitude
molgraph.graph.nodes[cum_atoms + j]["partial_charge"] = partial_charge
# set formal charge as a pure float
formal_charge = atom.formal_charge.magnitude # type: ignore
molgraph.graph.nodes[cum_atoms + j]["formal_charge"] = formal_charge
total_charge += formal_charge
for bond in molecule.bonds:
molgraph.graph.add_edge(
cum_atoms + bond.atom1_index,
cum_atoms + bond.atom2_index,
bond_order=bond.bond_order,
is_aromatic=bond.is_aromatic,
stereochemistry=bond.stereochemistry,
)
# formal_charge += molecule.total_charge
cum_atoms += molecule.n_atoms
molgraph.molecule.set_charge_and_spin(charge=total_charge)
return molgraph


def get_atom_map(inferred_mol: tk.Molecule, openff_mol: tk.Molecule) -> tuple[bool, dict[int, int]]:
"""Compute an atom mapping between two OpenFF Molecules.

Attempts to find an isomorphism between the molecules, considering various matching
criteria such as formal charges, stereochemistry, and bond orders. Returns the atom
mapping if an isomorphism is found, otherwise returns an empty mapping.

Args:
inferred_mol (tk.Molecule): The first OpenFF Molecule.
openff_mol (tk.Molecule): The second OpenFF Molecule.

Returns:
Tuple[bool, Dict[int, int]]: A tuple containing a boolean indicating if an
isomorphism was found and a dictionary representing the atom mapping.
"""
# do not apply formal charge restrictions
kwargs = dict(
return_atom_map=True,
formal_charge_matching=False,
)
isomorphic, atom_map = tk.topology.Molecule.are_isomorphic(openff_mol, inferred_mol, **kwargs)
if isomorphic:
return True, atom_map
# relax stereochemistry restrictions
kwargs["atom_stereochemistry_matching"] = False
kwargs["bond_stereochemistry_matching"] = False
isomorphic, atom_map = tk.topology.Molecule.are_isomorphic(openff_mol, inferred_mol, **kwargs)
if isomorphic:
return True, atom_map
# relax bond order restrictions
kwargs["bond_order_matching"] = False
isomorphic, atom_map = tk.topology.Molecule.are_isomorphic(openff_mol, inferred_mol, **kwargs)
if isomorphic:
return True, atom_map
return False, {}


def infer_openff_mol(
mol_geometry: pymatgen.core.Molecule,
) -> tk.Molecule:
"""Infer an OpenFF Molecule from a Pymatgen Molecule.

Constructs a MoleculeGraph from the Pymatgen Molecule using the OpenBabelNN local
environment strategy and extends metal edges. Converts the resulting MoleculeGraph
to an OpenFF Molecule using molgraph_to_openff_mol.

Args:
mol_geometry (pymatgen.core.Molecule): The Pymatgen Molecule to infer from.

Returns:
tk.Molecule: The inferred OpenFF Molecule.
"""
molgraph = MoleculeGraph.with_local_env_strategy(mol_geometry, OpenBabelNN())
molgraph = metal_edge_extender(molgraph)
return molgraph_to_openff_mol(molgraph)


def add_conformer(
openff_mol: tk.Molecule, geometry: pymatgen.core.Molecule | None
) -> tuple[tk.Molecule, dict[int, int]]:
"""Add conformers to an OpenFF Molecule based on the provided geometry.

If a geometry is provided, infers an OpenFF Molecule from it,
finds an atom mapping between the inferred molecule and the
input molecule, and adds the conformer coordinates to the input
molecule. If no geometry is provided, generates a single conformer.

Args:
openff_mol (tk.Molecule): The OpenFF Molecule to add conformers to.
geometry (Union[pymatgen.core.Molecule, None]): The geometry to use for adding
conformers.

Returns:
Tuple[tk.Molecule, Dict[int, int]]: A tuple containing the updated OpenFF
Molecule with added conformers and a dictionary representing the atom
mapping.
"""
# TODO: test this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure unit tests for add_conformer are added to verify its functionality and handle edge cases.

Would you like me to help by writing some unit tests for this function?

if geometry:
# for geometry in geometries:
inferred_mol = infer_openff_mol(geometry)
is_isomorphic, atom_map = get_atom_map(inferred_mol, openff_mol)
if not is_isomorphic:
raise ValueError(
f"An isomorphism cannot be found between smile {openff_mol.to_smiles()}"
f"and the provided molecule {geometry}."
)
new_mol = pymatgen.core.Molecule.from_sites([geometry.sites[i] for i in atom_map.values()])
openff_mol.add_conformer(new_mol.cart_coords * unit.angstrom)
else:
atom_map = {i: i for i in range(openff_mol.n_atoms)}
openff_mol.generate_conformers(n_conformers=1)
return openff_mol, atom_map


def assign_partial_charges(
openff_mol: tk.Molecule,
atom_map: dict[int, int],
charge_method: str,
partial_charges: None | list[float],
) -> tk.Molecule:
"""Assign partial charges to an OpenFF Molecule.

If partial charges are provided, assigns them to the molecule
based on the atom mapping. If the molecule has only one atom,
assigns the total charge as the partial charge. Otherwise,
assigns partial charges using the specified charge method.

Args:
openff_mol (tk.Molecule): The OpenFF Molecule to assign partial charges to.
atom_map (Dict[int, int]): A dictionary representing the atom mapping.
charge_method (str): The charge method to use if partial charges are
not provided.
partial_charges (Union[None, List[float]]): A list of partial charges to
assign or None to use the charge method.

Returns:
tk.Molecule: The OpenFF Molecule with assigned partial charges.
"""
# TODO: test this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to add unit tests for assign_partial_charges to ensure its correctness and robustness.

Would you like assistance in creating unit tests for this function?

# assign partial charges
if partial_charges is not None:
partial_charges = np.array(partial_charges)
chargs = partial_charges[list(atom_map.values())] # type: ignore[index, call-overload]
openff_mol.partial_charges = chargs * unit.elementary_charge
elif openff_mol.n_atoms == 1:
openff_mol.partial_charges = np.array([openff_mol.total_charge.magnitude]) * unit.elementary_charge
else:
openff_mol.assign_partial_charges(charge_method)
return openff_mol


def create_openff_mol(
smile: str,
geometry: pymatgen.core.Molecule | str | Path | None = None,
charge_scaling: float = 1,
partial_charges: list[float] | None = None,
backup_charge_method: str = "am1bcc",
) -> tk.Molecule:
"""Create an OpenFF Molecule from a SMILES string and optional geometry.

Constructs an OpenFF Molecule from the provided SMILES
string, adds conformers based on the provided geometry (if
any), assigns partial charges using the specified method
or provided partial charges, and applies charge scaling.

Args:
smile (str): The SMILES string of the molecule.
geometry (Union[pymatgen.core.Molecule, str, Path, None], optional): The
geometry to use for adding conformers. Can be a Pymatgen Molecule,
file path, or None.
charge_scaling (float, optional): The scaling factor for partial charges.
Default is 1.
partial_charges (Union[List[float], None], optional): A list of partial
charges to assign, or None to use the charge method.
backup_charge_method (str, optional): The backup charge method to use if
partial charges are not provided. Default is "am1bcc".

Returns:
tk.Molecule: The created OpenFF Molecule.
"""
if isinstance(geometry, (str, Path)):
geometry = pymatgen.core.Molecule.from_file(str(geometry))

if partial_charges is not None:
if geometry is None:
raise ValueError("geometries must be set if partial_charges is set")
if len(partial_charges) != len(geometry):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the length of partial_charges against the number of atoms in geometry instead of the length of geometry.

- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if len(partial_charges) != len(geometry):
if len(partial_charges) != geometry.num_sites:

raise ValueError("partial charges must have same length & order as geometry")
Comment on lines +297 to +300
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the length of partial_charges against the number of atoms in geometry instead of the length of geometry.

- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if geometry is None:
raise ValueError("geometries must be set if partial_charges is set")
if len(partial_charges) != len(geometry):
raise ValueError("partial charges must have same length & order as geometry")
if geometry is None:
raise ValueError("geometries must be set if partial_charges is set")
if len(partial_charges) != geometry.num_sites:
raise ValueError("partial charges must have same length & order as geometry")

Comment on lines +299 to +300
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctly validate the length of partial_charges against the number of atoms in geometry.

- if len(partial_charges) != len(geometry):
+ if len(partial_charges) != geometry.num_sites:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if len(partial_charges) != len(geometry):
raise ValueError("partial charges must have same length & order as geometry")
if len(partial_charges) != geometry.num_sites:
raise ValueError("partial charges must have same length & order as geometry")


openff_mol = tk.Molecule.from_smiles(smile, allow_undefined_stereo=True)

# add conformer
openff_mol, atom_map = add_conformer(openff_mol, geometry)
# assign partial charges
openff_mol = assign_partial_charges(openff_mol, atom_map, backup_charge_method, partial_charges)
openff_mol.partial_charges *= charge_scaling

return openff_mol
Binary file added tests/files/classical_md_mols/CCO.npy
Binary file not shown.
11 changes: 11 additions & 0 deletions tests/files/classical_md_mols/CCO.xyz
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
9

C 1.000000 1.000000 0.000000
C -0.515000 1.000000 0.000000
O -0.999000 1.000000 1.335000
H 1.390000 1.001000 -1.022000
H 1.386000 0.119000 0.523000
H 1.385000 1.880000 0.526000
H -0.907000 0.118000 -0.516000
H -0.897000 1.894000 -0.501000
H -0.661000 0.198000 1.768000
12 changes: 12 additions & 0 deletions tests/files/classical_md_mols/FEC-r.xyz
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
10

O 1.000000 1.000000 0.000000
C -0.219000 1.000000 0.000000
O -0.984000 1.000000 1.133000
C -2.322000 0.780000 0.720000
C -2.300000 1.205000 -0.711000
H -3.034000 0.686000 -1.332000
F -2.507000 2.542000 -0.809000
O -0.983000 0.948000 -1.128000
H -3.008000 1.375000 1.328000
H -2.544000 -0.285000 0.838000
12 changes: 12 additions & 0 deletions tests/files/classical_md_mols/FEC-s.xyz
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
10

O 1.000000 1.000000 0.000000
C -0.219000 1.000000 0.000000
O -0.981000 1.000000 1.133000
C -2.323000 0.828000 0.723000
C -2.305000 1.254000 -0.707000
H -2.567000 2.305000 -0.862000
F -3.125000 0.469000 -1.445000
O -0.983000 1.001000 -1.127000
H -2.991000 1.447000 1.328000
H -2.610000 -0.222000 0.848000
Binary file added tests/files/classical_md_mols/FEC.npy
Binary file not shown.
Empty file.
Binary file added tests/files/classical_md_mols/Li.npy
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/files/classical_md_mols/Li.xyz
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1

Li 0.0 0.0 0.0
Binary file added tests/files/classical_md_mols/PF6.npy
Binary file not shown.
9 changes: 9 additions & 0 deletions tests/files/classical_md_mols/PF6.xyz
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
7

P 0.0 0.0 0.0
F 1.6 0.0 0.0
F -1.6 0.0 0.0
F 0.0 1.6 0.0
F 0.0 -1.6 0.0
F 0.0 0.0 1.6
F 0.0 0.0 -1.6
Loading
Loading