Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to general sorting algorithms for hoomd and lammps #762

Closed

Conversation

CalCraven
Copy link
Contributor

@CalCraven CalCraven commented Sep 11, 2023

HOOMD functionality right now leads to bugs in writing out the forcefields.

There are two functions that need to be in sync
gmso/external/convert_hoomd.py to_hoomd_snapshot
gmso/external/convert_hoomd.py to_hoomd_forcefield

These are not compatible, as discussed in #758.

Additionally, the sorting algorithms in gmso.utils.sorting.py have some bugs with dihedrals, which need to be flipped if the two central atoms are equivalent, but the first and fourth atoms should be in reversed order.

This PR will restructure some of the sorting algorithms and migrate some of the ones that were used in formats/lammpsdata.py from gmso/core/views.py to gmso/utils/sorting.py

TODO List

  • unit tests
  • Move all sorting functions into
  • Fixes to sort_connection_members
  • Fix HOOMD to sort by site.atom_type.atomclasses
  • Work with wildcards
  • issue raised and addressed

@codecov
Copy link

codecov bot commented Sep 11, 2023

Codecov Report

Patch coverage: 77.86% and project coverage change: -0.18% ⚠️

Comparison is base (24a6101) 91.89% compared to head (06e9ddd) 91.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #762      +/-   ##
==========================================
- Coverage   91.89%   91.71%   -0.18%     
==========================================
  Files          66       66              
  Lines        6733     6780      +47     
==========================================
+ Hits         6187     6218      +31     
- Misses        546      562      +16     
Files Changed Coverage Δ
gmso/external/convert_hoomd.py 91.44% <76.36%> (-0.79%) ⬇️
gmso/utils/sorting.py 80.89% <77.77%> (-5.38%) ⬇️
gmso/core/views.py 88.46% <100.00%> (-0.79%) ⬇️
gmso/formats/lammpsdata.py 95.89% <100.00%> (+<0.01%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

[site.atom_type.atomclass for site in dihedral.connection_members]
)
unique_dihedrals[unique_members] = dihedral
unique_dtypes = [

Check notice

Code scanning / CodeQL

Unused local variable Note library

Variable unique_dtypes is not used.
# TODO: The range of ks is mismatched (GMSO go from k0 to k5)
# May need to do a check that k0 == k5 == 0 or raise a warning
container.params["-".join(dtype.member_types)] = {
member_classes = sort_by_classes(dtype)

Check notice

Code scanning / CodeQL

Unused local variable Note library

Variable member_classes is not used.
Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments about the code. One high-level comment I have (probably need a broader discussion later too), so it seems right now, we are going with atom class for everything (storing bond and bond params in the snapshot and forcefield), which would work for most of our current use case, i.e., when connection types are defined between atom classes. This, however, may throw an issue if there is a connection that is defined between atom types. The latter case is rarer but, IMO, it might happen and should be supported. In other word, I think we should go with atom types for everything instead of atomclass. This will make thing a bit bulkier, but it will allow us to cover those edge case.

else:
raise TypeError("Provided connection_type not supported.")


def sort_connection_members(connection, sort_by="name"):
"""Sort connection_members of connection."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a blurp here about what options sort_by can take

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

elif isinstance(potential, ImproperType):
return (
potential.member_classes[0],
*potential.member_classes[1:],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*potential.member_classes[1:],
*[sorted(potential.member_classes[1:]),

elif isinstance(potential, ImproperType):
return (
potential.member_types[0],
*potential.member_types[1:],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*potential.member_types[1:],
*sorted(potential.member_types[1:]),

elif len(namesList) == 4 and improperBool:
return tuple(
namesList[0],
sorted(*namesList[1:]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sorted(*namesList[1:]),
*sorted(namesList[1:]),

elif isinstance(connection, gmso.Improper):
site1, site2, site3, site4 = connection.connection_members
site2, site3, site4 = sorted([site2, site3, site4], key=sorting_key)
return [site1, site2, site3, site4]
else:
raise TypeError("Provided connection not supported.")


def sort_by_classes(potential):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider using a more descriptive name for this method, since sort_by_classes is kinda vague (regarding the desired input). Maybe something like sorted_potential_member_classes (sounds a bit clumsy but I will keep brainstorming some other option).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this method is similar for sort_by_types, so probably we need something that would distinct these two (sort_by_types and sort_by_classes) with the sort_connection_strings.

other names I can think sort_potential_member_types and sort_potential_member_classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding more detailed doc strings. I don't think every function needs to be that detailed. potential is the only argument, so we don't need it to also be part of the function name. I added details about what potential is and how it's sorted.

Comment on lines +91 to +97
if ( # TODO: why are these skipped?
isinstance(mb_force, hoomd.md.long_range.pppm.Coulomb)
or isinstance(mb_force, hoomd.md.pair.pair.LJ)
or isinstance(mb_force, hoomd.md.special_pair.LJ)
or isinstance(mb_force, hoomd.md.pair.pair.Ewald)
or isinstance(mb_force, hoomd.md.special_pair.Coulomb)
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ( # TODO: why are these skipped?
isinstance(mb_force, hoomd.md.long_range.pppm.Coulomb)
or isinstance(mb_force, hoomd.md.pair.pair.LJ)
or isinstance(mb_force, hoomd.md.special_pair.LJ)
or isinstance(mb_force, hoomd.md.pair.pair.Ewald)
or isinstance(mb_force, hoomd.md.special_pair.Coulomb)
):
if ( # TODO: why are these skipped?
isinstance(mb_force, [
hoomd.md.long_range.pppm.Coulomb,
hoomd.md.pair.pair.LJ,
hoomd.md.special_pair.LJ,
hoomd.md.pair.pair.Ewald,
hoomd.md.special_pair.Coulomb,
]
):

continue
keys = mb_force.params.param_dict.keys()
gmso_keys = gmso_force.params.param_dict.keys()
print("\n\n", keys, gmso_keys, gmso_force, "\n\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("\n\n", keys, gmso_keys, gmso_force, "\n\n")

@@ -1413,3 +1430,26 @@ def _convert_params_units(
potential.parameters = converted_params
converted_potentials.append(potential)
return converted_potentials


def _convert_single_param_units(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this belong to utils/units.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It should get moved into using the new UnitsClasses, but maybe we can do that in a later PR.

@CalCraven
Copy link
Contributor Author

One high-level comment I have (probably need a broader discussion later too), so it seems right now, we are going with atom class for everything (storing bond and bond params in the snapshot and forcefield), which would work for most of our current use case, i.e., when connection types are defined between atom classes. This, however, may throw an issue if there is a connection that is defined between atom types. The latter case is rarer but, IMO, it might happen and should be supported. In other word, I think we should go with atom types for everything instead of atomclass.

So I think it's a necessity to use atomclass instead of atomtype. This is related to #765, which is to try to get hoomd to support GAFF. In fact, that PR is updated with the changes to this PR, because both are necessary for GAFF, so in the end I'll probably address these comments/fixes directly in there and close this PR. But I'm getting away from the point. Here's the situation that arises.

A topology has 5 atomtypes, each with unique names, bonded in a line to form 2 unique dihedrals. When writing out the snapshot and the forcefield, the important attribute is the string that is used as the key for these lookup tables. If we use atomtypes, then it'll look like:

Molecule representation -> atom1 - atom2 - atom3 - atom4 - atom5
In snapshots
Dihedral1 = "atype1-atype2-atype3-atype4"
Dihedral2 = "atype2-atype3-atype4-atype5"

in forces
Dihedral_type1 = "atype1-atype2-atype3-atype4"
Dihedral_type2 = "atype2-atype3-atype4-atype5"

But if the classes of dihedrals are all the same, then we're duplicating dihedrals.
Dihedral_type1 = Dihedral_type2. If we used classes, we would reduce the number of dihedrals written. And 90% or more of our forcefields are defined by atomclasses, and with big forcefields we're at risk of writing huge numbers of dihedrals out with all of the duplication.

Now the situation you're describing, where the dihedrals are defined by their atomtypes, not atomclasses. In this scenario, the atomtypes = atomclasses.
atomtype1 = "OPLS_135" atomclass1 = "OPLS_135". In that case, we should still be okay to use the classes as they are equivalent in using their strings.

The strange situation would be:
atomtype1 = "OPLS_135" atomclass1 = "CT"
atomtype2="OPLS_136 atomclass1="CT"
but two different dihedrals are defined as
"OPLS_135-OPLS_136-X-X"
"CT-CT-X-X"
This would write these two dihedrals the same way. But I don't thing writing the atomtypes would be a useful solution here as well. Because then what are the atomtypes for the "CT-CT-X-X" dihedral? In theory it could be any combination of atomtypes that correlate to that class. So I'm not sure we could handle that situation either way, unless we know the original formulation of the dihedral.

for dihedral in top.dihedrals:
        unique_members = tuple(
            [site.atom_type.atomclass for site in dihedral.connection_members]
        )
        unique_dihedrals[unique_members] = dihedral

@daico007
Copy link
Member

I am not too worry about the duplication issue (since that will be more or less constrained by how many unique molecules in a systems and how many unique dihedrals in those molecules). There will be duplications, but I don't think that will be to that much of a degree. Consider that against the trade-off of supporting the 10% non-common case, that might be worth it.

What I mean about dihedral define between atomtypes, I mean it's look like:
type1="aa" type2="bb" type3="cc" type4="dd". Making atomclass to take the value of atomtype is just a work-around.

This topic has been brought up a while ago, but in the strange case that you brought up, the dihedral define by type will take precedent to that defined by atomclass.

@CalCraven
Copy link
Contributor Author

I think the duplication issue will get out of hand. For instance, I have simulations with up to 20 different atomtypes in them. But maybe only 10 or so atomclasses. I have the feeling that will result in a large number of dihedrals between the two cases. There's a reason something large like the OPLS forcefield defines the dihedrals by classes, not by types. I will put together an example of this and we can count the number of dihedrals, but run time will be affected by the number of dihedrals in hoomd no doubt about it. I've never actually seen a forcefield with hybrid type and class definitions. It seems a bit silly: if you have a unique site that needs it's own unique dihedral, then give it an atomclass that's the same as its atomtype. We certainly do not have any tested examples of such a forcefield if it is something we would want to support anyways.

@CalCraven
Copy link
Contributor Author

I also am totally spitballing with that 10% uncommon case. I've never actually seen that done. So who knows exactly what that number is.

@CalCraven
Copy link
Contributor Author

if the dihedral is type1="aa" and is given no class1, then it's class should also be "aa".

@daico007
Copy link
Member

It seems a bit silly: if you have a unique site that needs it's own unique dihedral, then give it an atomclass that's the same as its atomtype.

I am thinking more about the case where the bond/angle is defined on atomclass and the dihedral on atomtype (for some reason, like you try to develop a forcefield I try to have it very specific to the atom type level). In that case, the atom type still need its original atomclass info.

@CalCraven
Copy link
Contributor Author

Having just checked this, it resulted in 23 unique dihedrals by types and 14 by classes.

@CalCraven
Copy link
Contributor Author

test_dihedrals.txt

@CalCraven
Copy link
Contributor Author

I am thinking more about the case where the bond/angle is defined on atomclass and the dihedral on atomtype (for some reason, like you try to develop a forcefield I try to have it very specific to the atom type level). In that case, the atom type still need its original atomclass info.

I can see this. We need tests for it though, which we don't currently have. I think we could have a check at the beginning of the writer that checks to see if all the classes capture all of the unique dihedral expressions. If not, then we populate the snapshot and forces by types, but that shouldn't be the default case.

I think this PR should get merged tomorrow for CECAM, and I don't have the time to add that feature in. So potentially we can just raise this as a bug in the issues, and address it when we have the time/ a reasonable test case.

@daico007
Copy link
Member

included in #765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants