Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of OFF molecule/network molecule comparisons and isomorphisms #353

Open
j-wags opened this issue Jun 13, 2019 · 28 comments
Open

Comments

@j-wags
Copy link
Member

j-wags commented Jun 13, 2019

Description

System is a box of 1000 hexadecanes.

from openforcefield.typing.engines.smirnoff import ForceField
from openforcefield.topology import Molecule, Topology
from simtk.openmm import app
import time

hexadecane = Molecule.from_smiles('CCCCCCCCCCCCCCCC')
pdbfile = app.PDBFile('output.pdb')
start_time = time.time()
top = Topology.from_openmm(pdbfile.topology, unique_molecules=[hexadecane])
print("--- Topology creation finished in %s seconds ---" % (time.time() - start_time))
ff = ForceField('test_forcefields/smirnoff99Frosst.offxml')

--- Topology creation finished in 1585.9495000839233 seconds ---

start_time = time.time()

system = ff.create_openmm_system(top)
print("--- Ssytem creation finished in %s seconds ---" % (time.time() - start_time))

--- Ssytem creation finished in 57.06095290184021 seconds ---

output.pdb.gz

Thoughts

I suspect that this slowdown occurs in the graph matching stage of topology creation. Hexadecane has a huge number of self-symmetries. While the function only takes the first molecular topology match that's found, it may be generating all of them unnecessarily.

Relevant code is here: https://github.com/openforcefield/openforcefield/blob/master/openforcefield/topology/topology.py#L1635-L1664

We should poke around NetworkX to see if there's a faster algorithm that this code could use.

@jaimergp
Copy link
Contributor

I had to do something similar (identify copies of small molecules in a system) and found that "serializing" all connected molecules to canonical SMILES (as provided by rdkit) and comparing those was enough for identification purposes. Would that be enough in this case or am I missing something?

@jchodera
Copy link
Member

It looks like you also do the expensive graph matching for all molecules, then try to correct for having different mappings afterwards. We really need to do some fast hashing / caching to avoid this expense.

What if we used an algorithm like this:

  • For each Topology molecule, construct a hashable immutable object (like a tuple) that encodes the sequence of elements and bond indices within the molecule:
( (element_1, element_2, ..., element_N), ((bond1_atomindex1, bond1_atomindex2), ..., (bondN_atomindex1, bond1_atomindex2))
  • Check a dict to see if we have already cached a matched isomorphism to a unique molecule. If not, perform the graph isomorphism match and cache it in the dict.
  • Use the cached mapping to the unique molecule

If the Topology object consists of lots of repeated instances of the same molecule, this procedure would run the expensive graph isomorphism code exactly once, which should speed things up by ~1000-fold here.

@jchodera
Copy link
Member

In the case where many candidate molecules are provided (which is not this case, but could be a future use case), we would also want to speed up the determination of which unique molecule the Topology molecule maps to. You should also be able to use a fast hashing there based on, for example, the Hill formula (e.g. C16H34), which should narrow down the list of possible graph isomorphisms to evaluate.

From there, imperfect matches can also be rapidly rejected by first calling could_be_isomorphic() before attempting to compute complete graph isomorphisms.

Later on, we'll have to think about optimizing this for the case where one of the molecules in the system is a biological macromolecule.

@SimonBoothroyd
Copy link
Contributor

In the past when solving this problem I've always done very similar to what @jchodera suggests and find that in general it works well.

The only slight difference is that before even considering the unique_molecules, I would suggest going through all of the molecules in the system, building a hash for them, and then construct a dict of molecule # to a list of molecule indices which share the hash. The end result should in principle should be a dictionary with very few keys which should correspond to the number of components in the system.

You would then do the graph isomorphism matching on the first molecule in each of the lists in the dictionary map. In doing this you replace a lot of graph matching with simple hashing, and can safely only consider one molecule of each type when doing the matching (provided the hash is sensible).

The added benefit is that you can do some nice sanity checks and provide some feedback to the user. Namely, if you find that two of your hashes map to the same unique molecule something most likely is going wrong (e.g think of a pure system where for some reason one of the molecules has a slightly different atom ordering than the others in the system), and additionally if a unique molecule has not been matched to anything in the system. This could also be used to easily check / enforce that all molecules of the same type have the same atom ordering and are ordered contiguously which in turn could yield performance improvements and code simplification when parameterising the system.

This could also require (for openmm systems I think?) code to turn lists of atoms and bonds into individual molecules, but this isn't too bad to do and is pretty quick to run (you can use a flood-fill like algorithm).

The pseudocode would be something like:

all_molecules = molecules_from_openmm_topology(topology)


def hash_molecule(molecule):

    atom_tuples = (atom.element for atom in molecule.atoms)

    bond_list = [(min(bond.index1, bond.index2),
                  max(bond.index1, bond.index2), bond.order) for bond in molecule.bonds]

    bond_list.sort()

    bond_tuples = tuple(bond_list)

    molecule_tuple = (atom_tuples, bond_tuples)
    return hash(molecule_tuple)


distinguishable_molecules = {}

for index, molecule in enumerate(all_molecules):

    molecule_hash = hash_molecule(molecule)

    if molecule_hash not in distinguishable_molecules:
        distinguishable_molecules[molecule_hash] = []

    distinguishable_molecules[molecule_hash].append(index)

hash_to_unique_molecule = {}
unique_molecule_to_hash = {}

for molecule_hash in distinguishable_molecules:

    molecule = distinguishable_molecules[molecule_hash][0]

    for index, unique_molecule in enumerate(unique_molecules):

        is_match = check_isomorphism(molecule, unique_molecule)
        
        if not is_match:
            continue

        if molecule_hash in hash_to_unique_molecule:
            raise Exception('Molecule in system matches multiple unique mols.')

        hash_to_unique_molecule[molecule_hash] = unique_molecule
        
        if index in unique_molecule_to_hash:
            
            logging.warning('Multiple molecules of the same type have a different '
                            'atom / bond ordering - this is most likely unexpected '
                            'behaviour...')
            
            continue
            
        unique_molecule_to_hash[index] = molecule_hash

@j-wags
Copy link
Member Author

j-wags commented Jun 14, 2019

I had to do something similar (identify copies of small molecules in a system) and found that "serializing" all connected molecules to canonical SMILES (as provided by rdkit) and comparing those was enough for identification purposes. Would that be enough in this case or am I missing something?

@jaimergp This would be sufficient for identification, but unfortunately we need to get an atom index mapping from this process.

@j-wags
Copy link
Member Author

j-wags commented Jun 14, 2019

@jaimergp @jchodera @SimonBoothroyd Thanks for the really thoughtful feedback here. I agree that these ideas should net us a close to 1000x speedup in this case. This is super helpful :-)

@j-wags
Copy link
Member Author

j-wags commented Jun 14, 2019

@SimonBoothroyd

This could also require (for openmm systems I think?) code to turn lists of atoms and bonds into individual molecules, but this isn't too bad to do and is pretty quick to run (you can use a flood-fill like algorithm).

Alas! A critical facet of our implementation is that we don't require detailed molecule info (bond orders and stereochemistry) from OpenMM. If we had a reliable source of these, we wouldn't need unique_molecules at all. The pseudocode is still really helpful though.

@SimonBoothroyd
Copy link
Contributor

SimonBoothroyd commented Jun 14, 2019

So just to clarify - when I say into individual molecules, I mean each atom needs to be assigned a parent molecule index (i.e atoms [0-19] belong are assigned a molecule index of 0, atoms [20-39] are assigned a molecule index of 1) as opposed to we need to create actual Molecule objects. i.e I really mean we just need to know which groups of atoms are part of the same molecule for the hashing approach to work. As a very rough sketch, something more like:

namedtuple('OpenMMMolecule', ['atom_indices', 'bond_indices'])

If the bond order is there, awesome, if not, a bond order of None still works in the hash!

@j-wags
Copy link
Member Author

j-wags commented Dec 16, 2019

@jthorton and I have been discussing consolidating/deduplicating molecule identity operations. I think that making two highly-performant methods can cover all of our needs:

  • staticmethod FrozenMolecule.is_isomorphic, which returns bool
  • staticmethod FrozenMolecule.get_atom_map, which returns dict{int: int}

The arguments for both methods can be (mol1, mol2, atomic_number=True, atom_index=True, atom_is_aromatic=True, atom_stereo=True, bond_order=True, bond_is_aromatic=True, bond_stereo=True)

mol1 and mol2 can be FrozenMolecule, TopologyMolecule, or nx.Graph (any others?)

If the user specifies bond_order=True, but the nx.Graph that they provide doesn't have bond orders, we should raise a ValueError. We might consider having several subclasses of ValueError for the different deficiencies we could find.

I'm in favor of keeping these two functions separate, because I don't like when one function can return different types (bool vs. dict). This tends to confuse static code analysis and users.

The question remains of "can we do this in a way to efficiently handle batch operations?". Caching data on the Molecule objects might speed this up a bit. I'll need to think more about whether these functions will be compatible with implementing something like @SimonBoothroyd's algorithm above.

@SimonBoothroyd
Copy link
Contributor

SimonBoothroyd commented Dec 16, 2019

I'm in favor of keeping these two functions separate, because I don't like when one function can return different types (bool vs. dict). This tends to confuse static code analysis and users.

Or you could just have a staticmethod FrozenMolecule.are_isomorphic with a signature like:

are_isomorphic, atom_map = FrozenMolecule.are_isomorphic(mol1, mol2, ...)

where are_isomorphic: bool, atom_map: Optional[Dict[int, int]], which would probably make sense given that get_atom_map would likely need to call is_isomorphic anyway, and both methods would likely have a large code overlap so the additional cost to are_isomorphic would be minimal.

I'll need to think more about whether these functions will be compatible with implementing something like @SimonBoothroyd's algorithm above.

They would - they would just replace the check_isomorphism call.

Caching data on the Molecule objects might speed this up a bit

I would advise against caching unless essential - the required bookkeeping associated with caching is never fun, and avoiding caching would ensure that the methods to work well with Molecule without modification.

@j-wags
Copy link
Member Author

j-wags commented Dec 17, 2019

Ahh, yes. I agree -- We should make a single function that returns are_isomorphic: bool and atom_map: dict{int:int} or None. The kwarg return_atom_map would determine whether atom_map is returned as a dict, otherwise it'll be None. So the signature would look like:

are_isomorphic, atom_map = FrozenMolecule.are_isomorphic(mol1, mol2, return_atom_map=True, atomic_number=True, atom_index=True, atom_is_aromatic=True, atom_stereo=True, bond_order=True, bond_is_aromatic=True, bond_stereo=True)

(Reading through our DM conversation, I now see this may be what @jthorton was suggesting all along)

I agree that caching can be a pain to manage. On the other hand, the most efficient isomorphism checks in some cases overlap with other OFFTK functionality -- eg. SMILES are effectively molecule hashes for certain comparisons. Specifically, are_isomorphic(mol1, mol2, atom_map=False, atom_index=False) is equivalent to mol1.to_smiles()==mol2.to_smiles(), and we already cache SMILES. Further, the same query with atom_index=True could be performed by comparing the Canonical, isomeric, explicit hydrogen, mapped CMILES, and as we more tightly integrate the OFF ecosystem, users may find it helpful to have CMILES cached on OFFMols. That said, not all hashes will be useful, and maybe it's not useful to store anything past those two.

One of the other concerns is that the proposed function will only do 1 vs. 1 comparisons, meaning that matching a dataset of N molecules to M unique mols is an O(N*M) operation. However, if we could extract the hashes, then we could make a hash table for either M or N, and complete the matching in O(M+N) time using John/Simon's algorithm. Since we already have the algorithm in mind, and it is a significant "big-O-notation" speed-up for dataset matching, maybe we should have FrozenMolecule.are_isomorphic be able to accept either mol1 or mol2 as iterables (eg. lists, tuples) of molecules, and return the M x N array of bool matches and another MxN array of dicts/Nones

@j-wags j-wags changed the title Topology creation is very slow for large system Improve speed of OFF molecule/network molecule comparisons and isomorphisms Dec 17, 2019
@jthorton
Copy link
Collaborator

jthorton commented Dec 17, 2019

Yes, this was what I had in mind I think due to the number of optional arguments we have it makes sense to just have it as one function, with the option to get the mapping as well. Also, are there any arguments listed that we would always want to search by? I would imagine we would always compare the atomic_number so it would make sense to remove this from being optional?

@jchodera
Copy link
Member

jchodera commented Dec 17, 2019

There's one quick change that can be made to greatly accelerate FrozenMolecule.is_isomorphic with a few lines of code: Inexpensively check whether the atomic composition is the same before computing the expensive full graph isomorphism.

If we add a simple method to return the Hill formula, this could be checked for equality before proceeding further. It could even be used as a hash if you want to get super fancy with a couple of additional lines of code.

This presumes, of course, that you want to check atomic numbers are the same (atomic_number=True), but you could use this quick filter only if that option is True.

Edit: Just realized I suggested this earlier in this thread. But it's still an incredibly simple and inexpensive way of achieving a big speedup for now.

@jthorton
Copy link
Collaborator

jthorton commented Dec 17, 2019

This could easily be added to the Molecule class if useful and I noticed there is an implementation already in the Topology class called _networkx_to_hill_formula. Would it make sense to lift this to the Molecule class (to save code duplication?) as a static method that can take an OFFMol or networkx graph which is currently used?
We could also have something like mol.hill_form which calls the static method under the hood as well.

@jchodera
Copy link
Member

This could easily be added to the Molecule class if useful and I noticed there is an implementation already in the Topology class called _networkx_to_hill_formula. Would it make sense to lift this to the Molecule class (to save code duplication?) as a static method that can take an OFFMol or networkx graph which is currently used?

Weird, it seems like we only use that for forming error messages right now!

If Topology.from_openmm() needs to be sped up right now, a simple near-term fix would be to simply insert a hash to only consider potential unique molecule matches that match their Hill formula.

It would be good to consider the whole Topology.from_openmm() as a candidate for refactoring. It seems certain that you'd want to start pushing methods to the Molecule level when they operate on molecules in a generically useful manner, and allow molecules to check whether they are isomorphic or not (and what the optimal atom mapping is for isomorphic matches). This should lead to simplification fo this code.

If I recall correctly, there is one other issue regarding the desire to pick out the same mapping each time if multiple mappings are possible---for example in matching waters. By re-using the same mapping each time (when possible), this allows for a great deal of compression when exporting to tools like gromacs. I'm not sure if the current code does this or not---looks like it may not.

We could also have something like mol.hill_form which calls the static method under the hood as well.

+1 for useful methods like this, though I'd make it a property called Molecule.hill_formula.

@j-wags
Copy link
Member Author

j-wags commented Dec 17, 2019

Linking to #461 on this point

If we add a simple method to return the Hill formula, this could be checked for equality before proceeding further.
Weird, it seems like we only use that for forming error messages right now!

Yes, this was a "there-are-20-things-on-my-to-do list, either-this-gets-committed-now-or-never" feature. Since it was not heavily tested, I kept it as a private method only for formatting error messages. But I think it's behaved well enough (and there are enough eyes on it now) that it should be promoted into the public API.

@jthorton and I spoke this morning about the API for hill formulas. It seems best to make a single staticmethod FrozenMolecule.to_hill_formula(obj) function, where obj can be a FrozenMolecule, TopologyMolecule, or nx.Graph. We'll also add @property FrozenMolecule.hill_formula and @property TopologyMolecule.hill_formula which will call the staticmethod.

Also, Hill formula comparisons are isomorphism checks if we don't care about stereochemistry, aromaticity, bond orders, or bond atom indices. I know that bond_atom_indices aren't a kwarg yet for FrozenMolecule.are_isomorphic, but if they were, we'd have yet another area of overlap for helpful-to-cache properties and molecule comparison.

@j-wags
Copy link
Member Author

j-wags commented Dec 17, 2019

@jchodera And I fully agree about using heuristic checks to speed up comparisons. Hill formula comparison is a fast way to return False without needing to perform a full isomorphism check. Further, there may be combinations of kwargs/settings where SMILES or CMILES are faster than isomorphism checks as well. I think all of these fields are strong candidates for caching and hashing. We can have a lot of fun tuning this function for comparisons.

@jchodera
Copy link
Member

jchodera commented Dec 18, 2019

It seems best to make a single staticmethod FrozenMolecule.to_hill_formula(obj) function, where obj can be a FrozenMolecule, TopologyMolecule, or nx.Graph. We'll also add @Property FrozenMolecule.hill_formula and @Property TopologyMolecule.hill_formula which will call the staticmethod.

One quick note here: Currently, Molecule has a lot of to_X() methods that are not static methods and do not take arguments, and are used to retrieve a copy of the molecule in some transformed form.

If you want to make a static method called to_X(obj), you had better rework the entire public API to ensure consistency among all Molecule.to_X methods or you're going to confuse the hell out of users.

I would suggest keeping the static method private and instead using it as appropriate in methods that are consistently named with the rest of the API. Or perhaps establishing a new standard for static methods, like compute_X or convert_X.

@dotsdl
Copy link
Member

dotsdl commented Feb 11, 2020

After discussion with @j-wags today, I'll be taking a crack at a PR on this issue. Just performed a skim of the existing discussion and will reach out if I hit points of confusion. Thanks!

@j-wags
Copy link
Member Author

j-wags commented Feb 11, 2020

Awesome. Thanks, @dotsdl -- Please do reach out if you have any questions :-)

@dotsdl
Copy link
Member

dotsdl commented Feb 27, 2020

This may have been solved in 0ea045d by @jthorton. I'm seeing topology creation take between 65 and 130 seconds on my machine, whereas a run from e.g. e3a8856 (before @jthorton's changes) goes for far longer (would probably be close to @j-wags' original ~1600 seconds if I was willing to wait that long).

As for creating the system with

start_time = time.time()

system = ff.create_openmm_system(top)
print("--- Ssytem creation finished in %s seconds ---" % (time.time() - start_time))

I currently get the following traceback from master:

Warning: In AmberToolsToolkitwrapper.compute_partial_charges_am1bcc: Molecule '' has more than one conformer, but this function will only generate charges for the first one.

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-13-d7b70aedb3a7> in <module>
      1 start_time = time.time()
      2 
----> 3 system = ff.create_openmm_system(top)
      4 print("--- Ssytem creation finished in %s seconds ---" % (time.time() - start_time))

~/Library/openforcefield/openforcefield/typing/engines/smirnoff/forcefield.py in create_openmm_system(self, topology, **kwargs)
   1136         # Add forces and parameters to the System
   1137         for parameter_handler in parameter_handlers:
-> 1138             parameter_handler.create_force(system, topology, **kwargs)
   1139 
   1140         # Let force Handlers do postprocessing

~/Library/openforcefield/openforcefield/typing/engines/smirnoff/parameters.py in create_force(self, system, topology, **kwargs)
   2925             toolkit_registry = kwargs.get('toolkit_registry', GLOBAL_TOOLKIT_REGISTRY)
   2926             temp_mol.generate_conformers(n_conformers=10, toolkit_registry=toolkit_registry)
-> 2927             temp_mol.compute_partial_charges_am1bcc(toolkit_registry=toolkit_registry)
   2928 
   2929             # Assign charges to relevant atoms

~/Library/openforcefield/openforcefield/topology/molecule.py in compute_partial_charges_am1bcc(self, toolkit_registry)
   2177             charges = toolkit_registry.call(
   2178                       'compute_partial_charges_am1bcc',
-> 2179                       self
   2180             )
   2181         elif isinstance(toolkit_registry, ToolkitWrapper):

~/Library/openforcefield/openforcefield/utils/toolkits.py in call(self, method_name, *args, **kwargs)
   3425                 method = getattr(toolkit, method_name)
   3426                 try:
-> 3427                     return method(*args, **kwargs)
   3428                 except NotImplementedError:
   3429                     pass

~/Library/openforcefield/openforcefield/utils/toolkits.py in compute_partial_charges_am1bcc(self, molecule)
   2946                 subprocess.check_output([
   2947                     "antechamber", "-i", "molecule.sdf", "-fi", "sdf", "-o", "charged.mol2", "-fo", "mol2", "-pf", "yes", "-c", "bcc",
-> 2948                     "-nc", str(net_charge)
   2949                 ])
   2950                 #os.system('cat charged.mol2')

~/.conda/envs/openff/lib/python3.7/subprocess.py in check_output(timeout, *popenargs, **kwargs)
    409 
    410     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 411                **kwargs).stdout
    412 
    413 

~/.conda/envs/openff/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    510         if check and retcode:
    511             raise CalledProcessError(retcode, process.args,
--> 512                                      output=stdout, stderr=stderr)
    513     return CompletedProcess(process.args, retcode, stdout, stderr)
    514 

CalledProcessError: Command '['antechamber', '-i', 'molecule.sdf', '-fi', 'sdf', '-o', 'charged.mol2', '-fo', 'mol2', '-pf', 'yes', '-c', 'bcc', '-nc', '0']' returned non-zero exit status 1.

It may not necessarily be related to this issue, but any thoughts on what's going on with the antechamber call here? Is this expected at the moment?

@jthorton
Copy link
Collaborator

jthorton commented Feb 27, 2020

I have been testing Jeffs example system above of a box of 1000 hexadecanes, and with the current master I get the following times:

--- Topology creation finished in 35.20249533653259 seconds ---
--- System creation finished in 35.99267387390137 seconds ---

However, I think I can get this a bit faster by being more efficient, as I think on cases where I do check full isomorphism I do it twice so with a small code change this becomes:

--- Topology creation finished in 19.916192293167114 seconds ---
--- System creation finished in 38.71189212799072 seconds ---

I'll check if the tests still all pass before making any other changes.

Not too sure about the error above, however.

@dotsdl
Copy link
Member

dotsdl commented Feb 27, 2020

Awesome, this is great @jthorton! I imagine this is sufficient performance to close the issue once it's in?

@dotsdl
Copy link
Member

dotsdl commented Feb 27, 2020

Also, looking at this with fresh eyes this morning, it helps to turn off the profiler magic (%%prun) in Jupyter. Getting 35.85 seconds for topology creation.

Also, realized I had ambertools, but not ambermini installed in my conda development environment. I get 124.76 seconds for system creation.

@j-wags
Copy link
Member Author

j-wags commented Feb 27, 2020

This may have been solved in 0ea045d by @jthorton. I'm seeing topology creation take between 65 and 130 seconds on my machine, whereas a run from e.g. e3a8856 (before @jthorton's changes) goes for far longer (would probably be close to @j-wags' original ~1600 seconds if I was willing to wait that long).

Great! Thanks for checking into this. I agree that this is a huge performance improvement, and a minute or two is much better than an hour.

That said, we might reasonably need to scale from 1000 hexadecanes to 100,000 waters at some point (or nastier combos for mixed-solvent/membrane sims). For M molecules in the PDB files and U unique molecules, our current algorithm scales as O(U*M). Simon's proposal above would make it O(U^2), which is a large improvement in the vast majority of cases, and no worse in the worst-case scenario. So I'd like to leave this issue open until Simon's algorithm is implemented.

I could be wrong about the big-O notation of the current and proposed solutions -- Please correct me if so!

@j-wags
Copy link
Member Author

j-wags commented Feb 27, 2020

It may not necessarily be related to this issue, but any thoughts on what's going on with the antechamber call here? Is this expected at the moment?
Also, realized I had ambertools, but not ambermini installed in my conda development environment.

I've seen the error above a few times in my chargeincrementmodel branch, and I think it's either due to the switch to AmberTools19 or the switch to subprocess.check_output. I'm trying to pin down how I solved it -- will update here if I can find the reason.

We will be changing the OFFTK recipe to use AmberTools19 instead of AmberMini for the next release, so AT19 compatibility would be the highest-priority factor here. Of course, it's be better if they both worked (not all users will upgrade). But if it's strictly one-or-the-other, AmberTools is the more important one.

@dotsdl Can you confirm that it's a AmberMini vs. AmberTools issue? If so, we may need to find a command line syntax that's shared between both versions of antechamber, or have OFFTK detect which AmberX version it's talking to and change its behavior accordingly.

@j-wags
Copy link
Member Author

j-wags commented Feb 28, 2020

This doesn't seem to be the only AmberTools19-migration-related problem. We've opened a discussion about AT19-related issues here. #532

@mattwthompson
Copy link
Member

There's still a slowdown(s) somewhere

In [1]: import openmm.app

In [2]: from openff.toolkit import Molecule, ForceField, Topology
/Users/mattthompson/micromamba/envs/openff-toolkit-test/lib/python3.10/site-packages/openff/nagl_models/openff_nagl_models.py:6: UserWarning: Module openff was already imported from None, but /Users/mattthompson/software/openff-interchange is being added to sys.path
  from pkg_resources import resource_filename

In [3]: sage = ForceField("openff-2.0.0.offxml")

In [4]: hexadecane = Molecule.from_smiles('CCCCCCCCCCCCCCCC')
   ...: openmm_topology = openmm.app.PDBFile('output.pdb').topology

In [5]: topology = Topology.from_openmm(openmm_topology, unique_molecules=[hexadecane])

In [6]: %%timeit
   ...: Topology.from_openmm(openmm_topology, unique_molecules=[hexadecane])
   ...:
   ...:
4.84 s ± 44.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %%timeit
   ...: sage.create_openmm_system(topology)
   ...:
   ...:

1min 3s ± 1.15 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %%timeit
   ...: topology.molecule(0).assign_partial_charges(partial_charge_method="am1bccelf10")
   ...:
   ...:
7.44 s ± 48.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants