Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sulfonamide N(1-) charge wrong but only with MOZYME #239

Open
aizvorski opened this issue Dec 20, 2024 · 7 comments
Open

Sulfonamide N(1-) charge wrong but only with MOZYME #239

aizvorski opened this issue Dec 20, 2024 · 7 comments

Comments

@aizvorski
Copy link

aizvorski commented Dec 20, 2024

Describe the bug

This is apparently different behavior for ionized sulfonamides depending on whether MOZYME is used or not. This is any groups with smiles string S(=O)(=O)[NH-].

With MOZYME, if the correct -1 charge is given as an input, there is an error; without MOZYME it works.

To Reproduce

Input:

MOZYME XYZ PM6-D3H4X EPS=78.5  1SCF  CHARGE=-1 THREADS=64


C -5.005000 1 3.851000 1 15.504000 1
N -5.426000 1 1.038000 1 15.914000 1
O -4.902000 1 2.759000 1 17.839001 1
C -3.857000 1 3.699000 1 14.733000 1
H -3.319528 1 2.750854 1 14.748594 1
O -7.069000 1 2.673000 1 16.502001 1
C -3.383000 1 4.751000 1 13.936000 1
H -2.488679 1 4.609419 1 13.329168 1
C -4.049000 1 5.980000 1 13.914000 1
H -3.661553 1 6.813045 1 13.327466 1
C -5.231000 1 6.118000 1 14.666000 1
H -5.786553 1 7.055192 1 14.632366 1
C -5.698000 1 5.064000 1 15.455000 1
H -6.610806 1 5.188556 1 16.037554 1
S -5.599000 1 2.556000 1 16.565001 1
H -4.490000 1 0.633000 1 16.278999 1

Result:

COMPUTED CHARGE ON SYSTEM: +1, THIS IS NOT THE SAME AS THE CHARGE DEFINED IN THE DATA-SET: "CHARGE=-1"
CHARGE SPECIFIED IS INCORRECT. CORRECT THE ERROR BEFORE CONTINUING

Expected behavior

Without MOZYME, the calculation works

JOB ENDED NORMALLY

However it may not be quite correct. Still without MOZYME but adding CHARGES, with the first line being "XYZ PM6-D3H4X EPS=78.5 1SCF CHARGE=-1 THREADS=64 CHARGES", the result is:

   Ion Atom No.  Type    Charge
    1       2      N       +1
CHARGE SPECIFIED IN DATA SET:  -1 IS INCORRECT.

Operating system

MOPAC v23.0.2 from conda-forge, Ubuntu 22.04.4 LTS

Additional context

Happens every time; appars to happen whenever the system contains an ionized sulfonamide, it can be part of a larger molecule.

@godotalgorithm
Copy link
Collaborator

This is part of the default behavior of the MOZYME solver - it assigns a total charge and stops the calculation if it deviates from the user-assigned charge. The charge assignment occurs when MOZYME assigns a Lewis dot structure to the input molecule and a formal charge to each input atom, from which it constructs an initial set of localized molecular orbitals (LMOs). This chemical analysis is a complicated, ad-hoc process that attempts to provide reasonable assignments for common organic molecules and chemical groups, but it really can't guarantee anything.

When a chemical assignment deviates from user expectations, there are diagnostic tools and advanced features that can be used to make adjustments and force the desired behavior. A good starting point is the LEWIS keyword which prints information about the assigned Lewis structure and formal charges and then stops. With the sulfonamide group, the basic problem is that the N bonded to the S is being assigned a formal +1 charge rather than a -1 (i.e. the lone pair is missing). The simplest remedy available is to force a charge assignment using a chemical label. In this case, the calculation proceeds if the unlabeled N atom in the input file is replaced by N(-) as noted in the documentation for atom labels.

In your example, I am seeing a 2 kcal/mol difference in energy between the MOPAC and MOZYME results, which seems too large and probably signifies a bug. The PM7 results match to within 0.005 kcal/mol, so this is probably an issue with the PM6-D3H4X model specifically (perhaps some classical correction term that is accidentally being omitted during the MOZYME calculation). I'm not going to be able to hunt this bug down until the new year.

I will keep this Issue open until MOPAC's behavior for the sulfonamide group has been considered more carefully. The essential issue is the formal charge assignment for N with an X-N-H connectivity. Both +1 and -1 are reasonable closed-shell assignments, and I need to consider their relative prominence in a biomolecular context.

@aizvorski
Copy link
Author

aizvorski commented Dec 20, 2024

@godotalgorithm Thank you! I tried N(-) and that works; I didn't know about that feature.

The Lewis dot and formal charge assignment doesn't use the provided total charge to disambiguate?

About the charge state of N in X-N-Y, I can think of a few sulfonamides which are N- from real-world pharma examples, eg sulfamethoxazole (pKa 5.6) and sulfadiazine (pKa 6.5). Could have N- in X-N-H too, mostly next to a metal ion; although it may be less common to have that if not next to an ion. What molecules do you have in mind which would be N+ and have two instead of four covalent bonds?

@godotalgorithm
Copy link
Collaborator

There are a patchwork of overlapping features for adjusting the Lewis dot structure, with the METAL (to force formal charges), CVB (to force single bonds), and SETPI (to force double bonds) keywords being the more intended way to do it.

A total charge is a global constraint, but the algorithm for Lewis dot assignment is entirely based on identifying and satisfying local constraints (e.g. distance-based assignment of bonds, bond-counting arguments for assigning charges to close shells). Personally, I have been curious about the possibility of using local constraints to define some sort of graph-theoretic problem that might also incorporate global constraints such as total charge. Often, such problems are combinatorial in nature (e.g. SAT problems) and can only be solved heuristically, but sometimes they can have enough structure to enable efficient algorithms (e.g. minimum-weight perfect matching). Unfortunately, there doesn't seem to be much interest in or support for that sort of research. Another missing element in this assignment process is any notion of a chemical potential - the most natural way to enforce a global charge constraint would be through the adjustment of an electron chemical potential, which is absent from MOZYME's current approach.

The only example of a formal N+ that I can think of off the top of my head is the central N of an azide anion (N3-). I agree that N- seems more typical than N+ in the context of biologically relevant X-N-Y. I have asked the original MOPAC developer for his opinion on the matter and his rationale for this design choice, and I'll consider his response in deciding on a course of action.

@godotalgorithm
Copy link
Collaborator

On further consideration, the Lewis dot structure is intended to identify a filled valence shell, not just a closed valence shell, for atoms in covalent bonding situations, which would specifically correspond to the -1 formal charge for N in a X-N-Y topology. The formal N+ in the azide anion is the result of two double bonds with neighboring N atoms to produce a filled valence shell. I do then consider this a bug and will try to fix it.

I inquired with the original developer about the MOPAC/MOZYME heat discrepancy in your PM6-D3H4X example, and the reason for this numerical difference is discussed in a GitHub Issue.

@aizvorski
Copy link
Author

@godotalgorithm Thanks!

There are a patchwork of overlapping features for adjusting the Lewis dot structure

I wonder if it would make sense to add an option to provide the complete Lewis dot structure and skip any auto-assignment. I've recently been running MOPAC on a wide variety of molecules/structures and the auto-assignment fails on a substantial number of them, around 20%. Using the (+)/(-) atom labels and METAL keyword, it works in more cases, but there's still around 5% failures which I'm not sure how to fix. These are structures which make sense chemically, eg OpenEye can assign them formal charges and bond orders with no trouble.

@aizvorski
Copy link
Author

aizvorski commented Dec 26, 2024

Another question about atom labels, is there a way to use labels to set the atom charge to be either zero (rather than not set, as when omitting a label), or some value other than +1/-1?

If not, perhaps the current behavior can be extended, so that if the label can be parsed as a signed integer (eg "[+-]?\d+" regex) then that value becomes the exact charge. This would make things like "N(0)" and "Fe(+2)" work, and more generally would allow labelling all atoms.

@godotalgorithm
Copy link
Collaborator

At the moment, I don't think so. I considered doing exactly what you are suggesting when I first reviewed the code for assigning Lewis dot structures a few years ago in response to another Issue, but it wouldn't have been a simple fix, so I refrained. As I make revisions to fix this bug, I will reconsider making this adjustment, too. Ideally, atom labels should be equivalent to assigning formal charges using the METAL keyword, but I have to review all the different usages of that keyword. The documentation of the METAL keyword is a bit vague, and I'm not sure it is even possible to set a specific formal charge and select specific atoms (rather than elements) for these assignments simultaneously. Syncing the behavior of atom labels and the METAL keyword probably won't be too complicated, but expanding the functionality of the METAL keyword (in the likely event that it is necessary) may be a bit challenging because its implementation is a bit "labyrinthine".

The METAL keyword will remain the preferred method for adjusting formal charges because MOZYME calculations are very commonly initiated from PDB files, which already have PDB-specific labels for all atoms that would conflict with using atom labels for charge assignments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants