Mol fix and xyz #529

jthorton · 2020-02-26T13:12:12Z

Here I aim to add support for writing multiframe and single frame XYZ files and fixing the MOL file type writer and adding more tests for each of the file output types. We also have another speed improvement for #353.

Fix Writing to MOL error #523
Add tests
Update docstrings/documentation, if applicable
Update changelog

This is a new branch for the fix as the old one was not from master.

jthorton · 2020-02-26T13:37:31Z

As I did the branch wrong I have copied the comments across from the old PR

Could you specify a few things about the intended behavior of OFFTK with respect to XYZ?
Namely:

Will we only ever write conformers of the same molecule to a single XYZ file, or could multiple distinct molecules be written to a single XYZ?

My intention was that only one molecule would be written to an XYZ file but multipule conformers could be, however there is a way around this for users by doing something like this:

from openforcefield.topology import Molecule
methane = Molecule.from_smiles('C')
ethane = Molecule.from_smiles('CC')

xyz = open('molecules.xyz', 'w')
methane.to_file(xyz, 'xyz')
xyz = open('molecules.xyz', 'a')
ethane.to_file(xyz, 'xyz')

would give :

5
CH4
C 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
8
C2H6
C 0.0000000000 0.0000000000 0.0000000000
C 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000
H 0.0000000000 0.0000000000 0.0000000000

As we will not support the reading of this file type this is not an issue for us but could be for users when interfacing with other programs so am not sure if we should stop this behaviour or not.

Is atom indexing guaranteed to be preserved when writing XYZ?

Yes the file is generated based on the current molecule atom ordering, as this is done by the toolkit directly not some back end toolkit the order should never change.

What is the behaviour if offmol.conformers is None?

If there are no conformers then we set all of the positions of the atoms to [0,0,0] and write out the file copying the behaviour of writing a PDB with no conformers.

Do we expect XYZ to be loadable back into OFFTK in any form? If so...

No there is even less information in here than in a PDB file so we should never load from a XYZ file.

* Do we expect a molecule's connectivity and/or formal charges to survive a round trip to XYZ?
* Do we expect a molecule's `name` to survive a round trip to XYZ?
* Do we expect a molecule's `partial_charges` to survive a round trip to XYZ?
* Do we expect a molecule's `properties` dict to survive a round trip to XYZ?

jthorton · 2020-02-26T14:16:01Z

Tests are failing due to the Mol writer fix I have put in, due to the list of file writers being a class level object and we require the import of RDKit to link the file sufix to the file writer in this part:

class RDKitToolkitWrapper(ToolkitWrapper):
    """
    RDKit toolkit wrapper

    .. warning :: This API is experimental and subject to change.
    """

    from rdkit import Chem

    _toolkit_name = 'The RDKit'
    _toolkit_installation_instructions = 'A conda-installable version of the free and open source RDKit cheminformatics ' \
                                         'toolkit can be found at: https://anaconda.org/rdkit/rdkit'
    _toolkit_file_read_formats = ['SDF', 'MOL', 'SMI']  #TODO: Add TDT support
    _toolkit_file_write_formats = {'SDF': Chem.SDWriter, 'MOL': Chem.SDWriter,
                                   'SMI': Chem.SmilesWriter, 'PDB': Chem.PDBWriter, 'TDT': Chem.TDTWriter}

any environment that does not have RDKit installed will fail at this point as it is always run. This could be fixed by possibly having this dict be filled during an __init__ call which could import rdkit something like:

class RDKitToolkitWrapper(ToolkitWrapper):

    _toolkit_file_write_formats = {'SDF': None, 'MOL': None,
                                   'SMI': None, 'PDB': None, 'TDT': None}

    def __init__(self):
        try:
            from rdkit import Chem
            self._toolkit_file_write_formats = {'SDF': Chem.SDWriter, 'MOL': Chem.SDWriter,
                                                'SMI': Chem.SmilesWriter, 'PDB': Chem.PDBWriter, 'TDT': Chem.TDTWriter}
        except ImportError:
            pass

However, we still have the same problem of having to update the dictionary in two locations which is what caused the issue in the first place.

One other solution I have is to just have strings of the names of the class/functions that will do the file writing be stored in the dictionary that can then be executed using eval.

codecov-io · 2020-02-26T16:01:18Z

Codecov Report

❗ No coverage uploaded for pull request base (master@53e7e95). Click here to learn what that means.
The diff coverage is 93.33%.

openforcefield/utils/toolkits.py

j-wags · 2020-02-26T19:01:55Z

Excellent.

so am not sure if we should stop this behaviour or not.

I don't think we need to police the one-molecule-per-XYZ behavior if a user opens a file in append mode. If that messes up their workflow, it's their own fault :-)

I agree with all of your answers on the behavior and think this will be a great addition. Thanks, Josh!

j-wags · 2020-02-26T19:08:59Z

Tests are failing due to the Mol writer fix I have put in, due to the list of file writers being a class level object

I wonder if these should be instance-level? I've (occasionally) debugged problems by making two RDKitToolkitWrappers, and overridden the behavior of one of them by redefining its attributes. But it gets annoying when that attribute is class-level, since then it affects the other one as well.

Is there a good reason to keep the read and write formats at the class level? If not, I'd be fine with defining them on an instance-level (in __init__) and avoiding all this eval business.

jthorton · 2020-02-27T09:50:55Z

Yeah, that should be fine as we can do the import in the init, would we want to change this behaviour for both of the toolkit wrappers to make them consistent? Also, what should happen if RDKit is not installed and someone tried to create the wrapper should we fail loud with an import error?

…nit.

…ield into mol_fix_and_xyz

jthorton · 2020-03-02T10:11:56Z

Just need one final check that I have RDKit file writers set up correctly so that Mol and SDF write the same file format. Also, do we need more tests for each of the file writing formats similar to the XYZ testing I have put in?

j-wags

Looks good, Josh. Especially nice tests. I commented on a few spelling things, and requested a reference spec for our XYZ implementation, but otherwise this is excellent work.

One note (kinda to myself) -- MOL is technically different from SDF, in that it doesn't contain property information. https://en.wikipedia.org/wiki/Chemical_table_file#SDF. I don't think there's anything to do here for it (since we don't tell RDKit to write any properties anyway), however I'll probably need to override this behavior in #281

docs/releasehistory.rst

openforcefield/tests/test_toolkits.py

j-wags · 2020-03-03T04:39:06Z

openforcefield/tests/test_toolkits.py

@@ -38,6 +38,7 @@ def get_mini_drug_bank(toolkit):
    # This is a work around a weird error where even though the test is skipted due to a missing toolkit
    #  we still try and read the file with the toolkit
    if toolkit.is_available():
+        toolkit = toolkit()


This confused me for a minute -- Could you change it to toolkit = toolkit_class(), and have the input parameter above be toolkit_class as well?

Yeah no problem!

j-wags · 2020-03-03T04:40:18Z

openforcefield/topology/molecule.py

+        isomorphic = GM.is_isomorphic()

        if isomorphic and return_atom_map:
-            # now generate the sorted mapping between the molecules
-            GM = GraphMatcher(
-                mol1_netx,
-                mol2_netx,
-                node_match=node_match_func,
-                edge_match=edge_match_func)
-            for mapping in GM.isomorphisms_iter():
-                topology_atom_map = mapping
-                break
+            topology_atom_map = GM.mapping


Slick. I like it.

j-wags · 2020-03-03T04:49:27Z

openforcefield/utils/toolkits.py

+    @property
+    def toolkit_file_write_formats(self):
+        """
+        List of file formats that this toolkit can write.
+        """
+        return list(self._toolkit_file_write_formats.keys())


Thoughtful!

j-wags · 2020-03-03T04:49:51Z

openforcefield/utils/toolkits.py

@@ -1841,7 +1870,6 @@ def from_file_obj(self,
            a list of Molecule objects is returned.

        """
-        from openforcefield.topology import Molecule


openforcefield/utils/toolkits.py

j-wags · 2020-03-03T04:52:11Z

openforcefield/utils/toolkits.py

-            writer = rdkit_writers[file_format](file_obj)
-            writer.write(rdmol)
-            writer.close()
+            self.to_file_obj(molecule=molecule, file_obj=file_obj, file_format=file_format)


docs/releasehistory.rst

jthorton · 2020-03-03T10:24:27Z

Thanks for the feedback Jeff I have made the changes you suggested and will merge once tests pass.

joshhorton added 4 commits February 25, 2020 16:41

add xyz file support.

6d77554

add xyz file support.

eadfaa8

adding tests for xyz support and add Fix for #523

5d3c1cf

finished test for xyz support.

1696949

add import workaround.

b27b45b

jthorton commented Feb 26, 2020

View reviewed changes

openforcefield/utils/toolkits.py Outdated Show resolved Hide resolved

jthorton commented Feb 26, 2020

View reviewed changes

openforcefield/utils/toolkits.py Outdated Show resolved Hide resolved

joshhorton added 8 commits February 27, 2020 10:32

change to class variables to instance variables and set them in the i…

1a93a08

…nit.

testing fix.

14c30fe

Isomorphic checking speed improvments.

53e7e95

Merge branch 'master' of https://github.com/openforcefield/openforcef…

32ef7f3

…ield into mol_fix_and_xyz

moved to use instance attrs instead of class

b6b277b

moved to use instance attrs instead of class

e2e0567

moved to use instance attrs instead of class

2260533

Fix tests to use the new toolkit init.

0db4b7e

jthorton requested a review from j-wags March 2, 2020 10:07

joshhorton added 2 commits March 2, 2020 13:41

updated the release log.

d3b5e0c

updated the release log.

39ffbc3

j-wags approved these changes Mar 3, 2020

View reviewed changes

suggested changes added.

617d00f

jthorton merged commit 0e5ff41 into master Mar 3, 2020

jthorton deleted the mol_fix_and_xyz branch March 3, 2020 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mol fix and xyz #529

Mol fix and xyz #529

jthorton commented Feb 26, 2020 •

edited

Loading

jthorton commented Feb 26, 2020

jthorton commented Feb 26, 2020 •

edited

Loading

codecov-io commented Feb 26, 2020 •

edited

Loading

j-wags commented Feb 26, 2020

j-wags commented Feb 26, 2020

jthorton commented Feb 27, 2020 •

edited

Loading

jthorton commented Mar 2, 2020

j-wags left a comment

j-wags Mar 3, 2020

jthorton Mar 3, 2020

j-wags Mar 3, 2020

j-wags Mar 3, 2020

j-wags Mar 3, 2020

j-wags Mar 3, 2020

jthorton commented Mar 3, 2020

Mol fix and xyz #529

Mol fix and xyz #529

Conversation

jthorton commented Feb 26, 2020 • edited Loading

jthorton commented Feb 26, 2020

jthorton commented Feb 26, 2020 • edited Loading

codecov-io commented Feb 26, 2020 • edited Loading

Codecov Report

j-wags commented Feb 26, 2020

j-wags commented Feb 26, 2020

jthorton commented Feb 27, 2020 • edited Loading

jthorton commented Mar 2, 2020

j-wags left a comment

Choose a reason for hiding this comment

j-wags Mar 3, 2020

Choose a reason for hiding this comment

jthorton Mar 3, 2020

Choose a reason for hiding this comment

j-wags Mar 3, 2020

Choose a reason for hiding this comment

j-wags Mar 3, 2020

Choose a reason for hiding this comment

j-wags Mar 3, 2020

Choose a reason for hiding this comment

j-wags Mar 3, 2020

Choose a reason for hiding this comment

jthorton commented Mar 3, 2020

jthorton commented Feb 26, 2020 •

edited

Loading

jthorton commented Feb 26, 2020 •

edited

Loading

codecov-io commented Feb 26, 2020 •

edited

Loading

jthorton commented Feb 27, 2020 •

edited

Loading