-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physics based priors #26
Comments
Feel free to open a PR if you want to have a go at this, otherwise I can also make this change. |
Sounds good, I'll add that. |
Sorry for reopening an old issue - this never actually got implemented! I'm about to start on it now. I'll add the Physics based potentials will often require extra information. For example, one prior I want to add is the ZBL stopping potential for short range repulsion. It requires knowing the atomic number of every atom. A more complicated example is Coulomb, which depends on the charge of each atom. That could be a fixed number (formal charge or precomputed partial charge), but in other cases it will itself be calculated as an output of the model. What's the best way of providing extra information like this that's needed to compute a physics based potential? |
Another question: currently it only lets you specify a single prior model, but often you may want more than one. A ZBL potential for short range repulsion, plus Coulomb for long range interactions, plus D4 for dispersion. (SpookyNet includes all of those.) Any thoughts on the best way of handling this? |
And another question: how do we want to handle units? Nothing in the current code worries about them, except sometimes for a particular dataset. The inputs and outputs to a model are just numbers that might have any units. But physical potentials involve hardcoded constants like
|
can you use that single model as a wrapper for all the models that you
would like to use? It's just the entry point.
…On Mon, Oct 3, 2022 at 4:45 PM Peter Eastman ***@***.***> wrote:
Another question: currently it only lets you specify a single prior model,
but often you may want more than one. A ZBL potential for short range
repulsion, plus Coulomb for long range interactions, plus D4 for
dispersion. (SpookyNet includes all of those.) Any thoughts on the best way
of handling this?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOQDFOZQNOJDW332PW3WBNAWFANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
The code is unit-neutral, whatever you use as input it spits out as output
…On Mon, Oct 3, 2022 at 6:48 PM Peter Eastman ***@***.***> wrote:
And another question: how do we want to handle units? Nothing in the
current code worries about them, except sometimes for a particular dataset.
The inputs and outputs to a model are just numbers that might have any
units. But physical potentials involve hardcoded constants like eps_0
whose values depend on units. How should we handle this? Some possibilities
include
- Standardizing on a particular set of units for all models.
- Standardize on a particular set of units for a particular potential
function, and require the user to provide scaling factors for converting
positions, energies, etc. to the required units.
- Implement it at a higher level, where a model specifies what units
it uses.
- We could also make each Dataset specify its units, and
conversions between the dataset and model units would happen automatically.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOSBZZSRRT5ZD6ARC5TWBNPCJANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I know. And that's incompatible with physics based potentials. They involve internal parameters with units. It's impossible to compute them without knowing what units the inputs are in, and what units the outputs are expected to be in. |
Currently there's a single option in the configuration file, |
who builds the potential knows, so the problem is when we distribute it,
maybe we can add some optional unit indication somehow?
…On Tue, Oct 4, 2022 at 11:56 AM Peter Eastman ***@***.***> wrote:
The code is unit-neutral, whatever you use as input it spits out as output
I know. And that's incompatible with physics based potentials. They
involve internal parameters with units. It's impossible to compute them
without knowing what units the inputs are in, and what units the outputs
are expected to be in.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOSNCH2E4YKKSGLTXC3WBRHUJANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
We could extend that as we did for the dataset arguments with a prior_args
dictionary
…On Tue, Oct 4, 2022 at 12:05 PM Peter Eastman ***@***.***> wrote:
can you use that single model as a wrapper for all the models that you
would like to use? It's just the entry point.
Currently there's a single option in the configuration file, prior_model.
It specifies the class of the single prior model to create, whose
constructor has to take no arguments except the dataset. We need to be able
to specify much more complicated arrangements. For example, "Include both a
ZBL potential, using atomic numbers specified in the dataset, and a Coulomb
potential, using charges that are generated as an output of the model."
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUORJGKTK5PHL2OVOPPLWBRIS7ANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Perhaps we're thinking about this differently. Currently the file |
yes, that is fine. Maybe I don't understand the problem, you can specify
whatever class.
Maybe just do what you need on a PR so we can discuss it. Especially if
it's backward compatible, we can merge it.
g
…On Tue, Oct 4, 2022 at 12:15 PM Peter Eastman ***@***.***> wrote:
Perhaps we're thinking about this differently. Currently the file
priors.py includes only a single prior model: Atomref, which subtracts a
reference energy that is defined in the dataset for each atom type. My
understanding is that we want to add more choices that implement common
physical models. Someone should be able to specify prior_model: Coulomb
in their configuration file, and it will add a Coulomb interaction to
whatever model they're trying to train. Is that different from your
understanding?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOWZ4NQG2VVUKFAVSTTWBRJZXANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Let's consider the problem of adding a Coulomb interaction. To begin with, it can be implemented in several ways.
First problem is that I don't think TorchMD-Net really supports multitask models yet? Its models produce a single number for each atom, which is interpreted as energy. We need them to produce multiple values for each atom: both energy and charge, or energy and electronegativity. That might also involve extra terms in the loss function: train the model to reproduce charges specified in the dataset. Next we need to define a mechanism for datasets to provide the extra information required: partial charges, formal charges, or both. Then there's the problem of units. The Coulomb code receives positions in some units and it needs to produce an energy in some units. The value of Finally we need to design the user interface for all of this. What options will the user add to their configuration file to request a particular combination of physical terms, calculated in a particular way, trained with a particular loss function? You need to be able to specify things like this: "My dataset reports atomic numbers, formal charges, and partial charges for every atom. I want the model to predict electronegativities, which will be used to compute partial charges. Include a loss term based on how well the predicted charges match the ones in the dataset. Once the charges are computed, add ZBL, Coulomb, and dispersion terms to the final energy." |
Would it be better to just delta the forces outside torchmd-net? For
example in openMM and just use torchmd-net for the NNP part?
Let's have a chat about it. I'll contact you directly
…On Tue, Oct 4, 2022 at 1:08 PM Peter Eastman ***@***.***> wrote:
Let's consider the problem of adding a Coulomb interaction. To begin with,
it can be implemented in several ways.
- The dataset provides a precomputed charge for every atom. You
compute the interaction based on them.
- The model predicts a charge for every atom. They get used to compute
the interaction.
- The model predicts an electronegativity for every atom. You use them
to solve for the charges, which then get used to compute the interaction.
This also requires knowing the total charge on every molecule, or even
better the formal charge on every atom.
First problem is that I don't think TorchMD-Net really supports multitask
models yet? Its models produce a single number for each atom, which is
interpreted as energy. We need them to produce multiple values for each
atom: both energy and charge, or energy and electronegativity. That might
also involve extra terms in the loss function: train the model to reproduce
charges specified in the dataset.
Next we need to define a mechanism for datasets to provide the extra
information required: partial charges, formal charges, or both.
Then there's the problem of units. The Coulomb code receives positions in
some units and it needs to produce an energy in some units. The value of
eps_0 depends on the units. It's impossible to calculate the result
without knowing the units.
Finally we need to design the user interface for all of this. What options
will the user add to their configuration file to request a particular
combination of physical terms, calculated in a particular way, trained with
a particular loss function? You need to be able to specify things like
this: "My dataset reports atomic numbers, formal charges, and partial
charges for every atom. I want the model to predict electronegativities,
which will be used to compute partial charges. Include a loss term based on
how well the predicted charges match the ones in the dataset. Once the
charges are computed, add ZBL, Coulomb, and dispersion terms to the final
energy."
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUORU5VPJAKUYACSSO6DWBRQAFANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Multi-head implementationYes, currently models can only have a single output head. I think adding multi-head support could be useful, however, requires changes to the current training interface and poses some important design questions. For example: If then you want charges to come either from the model or from the dataset it gets even more complicated, indicating which one should be used. You would also need to order the computation of the output heads as you are suggesting that the output of one head could depend on the prediction of another. This relationship would also have to be defined in the config file. Prior args
This already exists and currently is a critical piece in order to reconstruct prior models from model checkpoints. If the model contains a prior model, the arguments required for building this model will be stored in the hparams.yaml file as they are set here torchmd-net/torchmdnet/scripts/train.py Line 129 in a80e378
The load_model function expects prior_args to be set. The prior model itself expects either the dataset object or the prior args to be present to correctly instantiate the object. For example the Atomref prior depends on max_z , which is stored in prior_args but falls back to retrieving this from the dataset object in case max_z is not set: torchmd-net/torchmdnet/priors.py Lines 50 to 57 in a80e378
This is the section of code that reconstructs a prior model from a checkpoint file but that is ignored if it's the first time constructing this model: torchmd-net/torchmdnet/models/model.py Lines 67 to 77 in a80e378
In the first instantiated of a model, the prior_model variable will not be None in this context as it is directly passed to the create_model function: torchmd-net/torchmdnet/module.py Line 24 in a80e378
Defining unitsI think the freedom the current unit handling (i.e. none) gives is very powerful. If you want to predict some property you don't have to rely on us supporting units for this property but instead just plug in a dataset with the appropriate label. For self- or unsupervised pretraining we also don't necessarily have a unit that we predict, if for example we mask the atom type of one atom and predict this. To pass unit information to prior models I think it does make sense that datasets have some standardized way of defining their units (if it makes sense to define a unit for a certain dataset). Similarly to how Atomref currently works, it relies on the dataset implementing the torchmd-net/torchmdnet/priors.py Line 57 in a80e378
you could in the same way access some kind of get_units function and throw an error if a given dataset doesn't implement that.
|
And yes, through the config file and model creation/loading you are currently limited to just one prior model, however, technically it would be possible to wrap the model in multiple prior models by just nesting them. |
That would be great if all three of us can meet to discuss it.
I think that's more general than what we need. For Coulomb the two outputs (per-atom energy and per-atom charge) can be predicted independently. Each one gets fed into a calculation that produces a per-sample energy, and the two get added together. They're only linked through the loss function. |
In this case it's not strictly either/or. Some attributes will be retrieved from the dataset (for example, the atomic numbers for atom types). Others always need to be specified in the parameter file (for example, the cutoff distance). The existing code assumes everything can be retrieved from the dataset: torchmd-net/torchmdnet/scripts/train.py Line 128 in c1c4fcf
Any suggestions on the best way to handle this, so we can pass configuration options to the prior even when first creating it? |
You could simply pass |
With ZBL implemented in #134 and D2 in progress in #143, the next question is how to combine multiple priors. What if I want a model to include both ZBL and D2? The internal changes can be very simple. We can just make One option is to make prior_model:
- ZBL
- D2
prior_args:
- {"cutoff_distance": 4.0, "max_num_neighbors": 50}
- {"cutoff_distance": 10.0, "max_num_neighbors": 100} A possibly cleaner option is to combine them: prior_model:
- ZBL:
cutoff_distance: 4.0
max_num_neighbors: 50
- D2:
cutoff_distance: 10.0
max_num_neighbors: 100 |
Sounds good! I think version two is way more readable, which I think is what we should be going for in the config files. I reckon both versions will be hard to implement in the CLI? |
We would have to come up with a syntax for specifying it on the command line and write our own code for parsing it. |
we could also drop the CLI and assume we initialize via a yaml file.
…On Wed, Nov 2, 2022 at 12:03 AM Peter Eastman ***@***.***> wrote:
We would have to come up with a syntax for specifying it on the command
line and write our own code for parsing it.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOXYE72YFFQ5B7SEKSTWGGOUZANCNFSM47GPOV2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Being able to override settings from the command line is still useful, at least for some settings. For example, to train several copies of the same model with different random number seeds. But the prior model is probably one of the less important ones to override from the command line. |
I'm in favour for the second version. We are already using more structure inputs for the data loaders. Regarding the CLI, we maintain the ability to override simple options like seed. |
I'm starting work on a Coulomb prior. For the moment I'm just trying to implement fixed, precomputed partial charges. Later I'll get to charges that are dynamically predicted by the model. This requires some way of passing in charges. Currently the prior gets invoked as def post_reduce(self, y, z, pos, batch): For the current ones that's sufficient because Any suggestions on the best way of structuring it? |
All of them. They need to incorporate the extra arguments into the calculation. |
Just do what you need, test it, and break it in the branch.
We do normally like this, we have a prototype branch and when we know that
it's valuable, we rewrite it to incorporate it.
g
…On Tue, Feb 13, 2024 at 6:57 PM Peter Eastman ***@***.***> wrote:
All of them. They need to incorporate the extra arguments into the
calculation.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOS7XAG237R3IQ43QILYTOSRVAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTANZSHE4Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I don't know what the best way of incorporating it into the model is. That's why I asked the question:
I'm hoping someone with experience in the internals of the models will provide useful suggestions. |
Not sure if I got your case correctly Peter, but TensorNet currently
supports total charge because we decided to default to this behavior.
However, if you look into the model, the total charge is assigned to every
node and used in every node’s interaction product. We indeed did
experiments by setting precomputed partial charges to each atom instead of
the total charge. This just would require modifying the line in the
embedding part where we do q[batch]. Our experiments worked very well, but
we needed to use fix partial charges (that is, Gasteiger for example). If
partial charges were changing across conformations instabilities arised,
but you could revisit that. Again, I don’t know if this is exactly your
issue with charges, but in case it is, a very small fix to TensorNet would
allow you to do this.
…On Tue, 13 Feb 2024 at 18:57, Peter Eastman ***@***.***> wrote:
All of them. They need to incorporate the extra arguments into the
calculation.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA4FSVUCOVABLIJKQRLYTOSRXAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTANZSHE4Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Also, you can use the same approach with molecular spins, instead of
charges. It is just a trick in the model to make conformations
non-degenerate when there are other electronic degrees of freedom.
…On Tue, 13 Feb 2024 at 19:06, Peter Eastman ***@***.***> wrote:
I don't know what the best way of incorporating it into the model is.
That's why I asked the question:
What would be the best way of incorporating that extra information into
the calculation?
I'm hoping someone with experience in the internals of the models will
provide useful suggestions.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA375E4KN52C2XHU4JDYTOTQXAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGIYTCOJSGYZQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
So all I need to do is change torchmd-net/torchmdnet/models/tensornet.py Lines 240 to 243 in 6d8e315
to if q is None:
q = torch.zeros_like(z, device=z.device, dtype=z.dtype)
elif q.shape != z.shape:
q = q[batch] Then it should work correctly with any of the following cases?
|
That’s it! Notice that in front of qs in the interaction layers (both in
the interaction product and the squared residual update, the latter one I
forgot to mention before) there is a factor of 0.1. This is arbitrary, but
it worked fine in my experiments. However, notice that in the case partial
charge (or let’s say ‘hypercharge’, since it can be whatever) is -10, you
get 0 and cancel the interaction. Nevertheless, I don’t know if partial or
total charges of -10 are realistic in our setting (I assume not). What I
want to say is that from that point on, once you have your property stored
in q, you can do whatever with it. For example, a multilayer perceptron
with input size 1 and output size hidden channels, and you element-wise
multiply the interaction products and the residual update. This is just one
arbitrary case. For anything else you want to explore or try which is not
customary, I can help you with it.
…On Tue, 13 Feb 2024 at 22:44, Peter Eastman ***@***.***> wrote:
So all I need to do is change
https://github.com/torchmd/torchmd-net/blob/6d8e3159cfb8bb971ecf7a2abd589735d79a7e53/torchmdnet/models/tensornet.py#L240-L243
to
if q is None:
q = torch.zeros_like(z, device=z.device, dtype=z.dtype)
elif q.shape != z.shape:
q = q[batch]
Then it should work correctly with any of the following cases?
- q is None
- q has one value per sample
- q had one value per atom
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA52HCXNHIGEY7KLIJLYTPNCZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGI3DOMBUGM3A>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
For partial charges that should be a safe assumption. They're usually between -1 and 1. They can be a bit larger for some ions, but they shouldn't be anywhere close to 10. Total charge could reach 10 in some cases. I notice that |
I am afraid you cannot deal with charges (not in this way) for num layers
equal to 0. It would require thinking of another way of doing it. Perhaps
it would be worth adding a warning when q is not None and num layers is 0.
…On Wed, 14 Feb 2024 at 05:48, Peter Eastman ***@***.***> wrote:
For partial charges that should be a safe assumption. They're usually
between -1 and 1. They can be a bit larger for some ions, but they
shouldn't be anywhere close to 10. Total charge could reach 10 in some
cases.
I notice that q is only passed to the interaction layers, not to the
embedding, meaning if you set num_layers to 0 the charge is ignored.
Should it also be passed to the embedding?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA4YUYESBAHJEV7L5RDYTQ6ZRAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGMYDONRZGA4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Suppose we want to add in multiple values, for example both charge and spin. Can this method be generalized to handle that case? Or do we need to figure out a different way of incorporating them into the calculation? |
We will be currently looking into that. From what I understand, for spin
you either have singlet or triplet, which would be a binary variable? In
any case, I see nothing preventing us from creating a rescaling coefficient
for the interaction products and the residual update which is a suitable
linear combination (perhaps even with learnable coefficients, though it
might not be necessary) of a spin and a charge term, as the most simple
case. If you explore something in this direction, let us know, please.
…On Wed, 14 Feb 2024 at 22:49, Peter Eastman ***@***.***> wrote:
Suppose we want to add in multiple values, for example both charge and
spin. Can this method be generalized to handle that case? Or do we need to
figure out a different way of incorporating them into the calculation?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA57ZTOKGTFPXUXHS63YTUWM5AVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TCNJSGE2Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Do you think it could work to just append the extra values to the embedding vector at torchmd-net/torchmdnet/models/tensornet.py Line 330 in c0edfed
Or is that too simplistic? |
It could work. However, I chose the option of just multiplying because
including charges came after having the original TensorNet model, and I
wanted to make sure the model with q=0 identically defaulted to the
original one (the one for which I had run all benchmarks), plus I find it
elegant and you don’t need to change any tensor or learnable layer shapes,
or even introduce something extra learnable.
…On Wed, 14 Feb 2024 at 23:15, Peter Eastman ***@***.***> wrote:
Do you think it could work to just append the extra values to the
embedding vector at
https://github.com/torchmd/torchmd-net/blob/c0edfedfcdb841f0b52d571f3f788339ef5ff486/torchmdnet/models/tensornet.py#L330
Or is that too simplistic?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOAY7OVXDLLIKGABOIALYTUZQZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TSMZVGQYQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
However, thinking it better, you need to change the output energy for
different qs and spins with just two new neurons in the most initial part
of the model, perhaps it gets too ‘diluted’ and it is difficult for the
network. For me it makes more sense changing intermediate things, and
things that you know have a large effect on predictions. Changing the
interaction product strength can have a large effect (at least in my mind).
On Wed, 14 Feb 2024 at 23:21, Guillem Simeon ***@***.***>
wrote:
… It could work. However, I chose the option of just multiplying because
including charges came after having the original TensorNet model, and I
wanted to make sure the model with q=0 identically defaulted to the
original one (the one for which I had run all benchmarks), plus I find it
elegant and you don’t need to change any tensor or learnable layer shapes,
or even introduce something extra learnable.
On Wed, 14 Feb 2024 at 23:15, Peter Eastman ***@***.***>
wrote:
> Do you think it could work to just append the extra values to the
> embedding vector at
>
>
> https://github.com/torchmd/torchmd-net/blob/c0edfedfcdb841f0b52d571f3f788339ef5ff486/torchmdnet/models/tensornet.py#L330
>
> Or is that too simplistic?
>
> —
> Reply to this email directly, view it on GitHub
> <#26 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ANJMOAY7OVXDLLIKGABOIALYTUZQZAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ3TSMZVGQYQ>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
I'll try both approaches and see if one works better. The embedding approach appeals to me because you can easily incorporate an arbitrary set of global or per-atom properties. I also have to admit that I don't really understand the logic behind the current approach. You increase the strength of all interactions for positive atoms/molecules and decrease the strength of all interactions for negative atoms/molecules. Why? One could also take the embedding vector, append the extra values, and then pass it through a linear layer that mixes everything together and reduces it back down to the original length. Or you could do something similar to Cormorant. Instead of learning an embedding vector it learns a matrix that mixes together whatever input values you want for each atom and produces the embedding vector. |
There is no logic in terms of physics, tbh. The only logic is that the
magnitudes of the scalar, vector and tensor features that are fed to
subsequent layers are different, and all subsequent layers learn to map
these differences to different energies. It worked quite well in our
experiments. If you ask me, the most simple approach for your case is:
- count the number of extra args, d
- intialize a linear from d to hidden channels, Lin
- do Lin(extra_args) and elementwise multiply the output to the products
and the residual update, similar to what is being done now (currently, all
channels are weighted by the same number, in the previous setting, you
modify strengths channelwise)
But since this is deep learning, anything could be done. I think the right
way to proceed is in terms of what is the minimal amount of modifications
you need to make, and to make sure (at least in my opinion) that no
extra_args defaults to original TensorNet. Otherwise I do not have
performance guarantees for you in terms of current benchmark datasets,
since we would be talking about a different model.
I hope it helps.
…On Thu, 15 Feb 2024 at 00:01, Peter Eastman ***@***.***> wrote:
I'll try both approaches and see if one works better. The embedding
approach appeals to me because you can easily incorporate an arbitrary set
of global or per-atom properties. I also have to admit that I don't really
understand the logic behind the current approach. You increase the strength
of all interactions for positive atoms/molecules and decrease the strength
of all interactions for negative atoms/molecules. Why?
One could also take the embedding vector, append the extra values, and
then pass it through a linear layer that mixes everything together and
reduces it back down to the original length. Or you could do something
similar to Cormorant. Instead of learning an embedding vector it learns a
matrix that mixes together whatever input values you want for each atom and
produces the embedding vector.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANJMOA53NL45UFYQCUPHHODYTU64DAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJUGQ4TGMBYHEYQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I tried using the current method to inject partial charges into the model. The result was not good: it roughly doubled the error. Then I tried the method I described above: append the partial charge to the embedding vector, and use a linear layer to mix it back down to the original length. That worked nicely and gave a good result. This has the advantage that it easily generalizes to arbitrary numbers of global and per-atom scalar parameters. If there are no extra arguments, there's nothing to append and it skips the linear layer, so the model is unchanged. This same approach could be used for all the models if we want, not just TensorNet. Would you be open to a PR implementing it? |
Hi Peter,
we have results that show that partial charges work better.
We are about to finalize a paper and will make available the final version
within 30 days.
Regarding your changes, I am not sure that they work in terms of
maintaining the properties of TN. We can talk about this after we finalize
the current work.
g
…On Mon, Feb 19, 2024 at 9:17 PM Peter Eastman ***@***.***> wrote:
I tried using the current method to inject partial charges into the model.
The result was not good: it roughly doubled the error. Then I tried the
method I described above: append the partial charge to the embedding
vector, and use a linear layer to mix it back down to the original length.
That worked nicely and gave a good result.
This has the advantage that it easily generalizes to arbitrary numbers of
global and per-atom scalar parameters. If there are no extra arguments,
there's nothing to append and it skips the linear layer, so the model is
unchanged. This same approach could be used for all the models if we want,
not just TensorNet.
Would you be open to a PR implementing it?
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOWWQ6E547NJMAU76UDYUOXOHAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGMYTCMJTG4YA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Work better than what?
They exactly maintain it. If you don't add charges, they don't modify the model in any way. |
Than the model without charges
…On Tue, Feb 20, 2024 at 4:49 PM Peter Eastman ***@***.***> wrote:
we have results that show that partial charges work better.
Work better than what?
Regarding your changes, I am not sure that they work in terms of
maintaining the properties of TN.
They exactly maintain it. If you don't add charges, they don't modify the
model in any way.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUORGMDQY2FE6A4MXL53YUTAZHAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TAOBYHEZA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Yes, that's what I'm finding as long as I inject the charges with the method I described. If I do it with the current code, it breaks the model. |
We are not using the method you described.
…On Tue, Feb 20, 2024 at 4:54 PM Peter Eastman ***@***.***> wrote:
Yes, that's what I'm finding as long as I inject the charges with the
method I described. If I do it with the current code, it breaks the model.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOUXJKORYOTLV6MNML3YUTBMBAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TCOBYGU2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
What method are you using? If it's what's in the code, it doesn't work. If it's something else, please provide the code so I can try it. |
Hi Peter, the thing is that trying partial charges in our case (which were Gasteiger ones, btw, yours are fix?) was something we tried out of curiosity after knowing that total charges worked. We believe total charge is the way to proceed, and in fact, to the best of my knowledge, is currently being used as it is in the code now, and it helps the model to work fine when there are simultaneously charged and neutral molecules in the dataset (which can be sometimes confused for degenerate inputs). What I think is that your use case is different than ours, meaning that we have never tried partial charges + Coulomb. We didn't want to rely on partial charges because of their 'arbitrary' nature, as opposed to the real physical observable that total charge is. Guillem |
I also forgot to mention that, on top of what I said, relying on partial charges did not seem optimal to us because you need always to use exactly the same method of computation of the partial charges, since otherwise any small change in their value would mean a different energy output. In contrast, total charge is an integer. |
Well, there are situations where partial charges are good and that's why we
used them. As soon as we finish the document, we share it.
…On Tue, Feb 20, 2024 at 5:20 PM Guillem Simeon ***@***.***> wrote:
Hi Peter,
the thing is that trying partial charges in our case (which were Gasteiger
ones, btw, yours are fix?) was something we tried out of curiosity after
knowing that total charges worked. We believe total charge is the way to
proceed, and in fact, to the best of my knowledge, is currently being used
as it is in the code now, and it helps the model to work fine when there
are simultaneously charged and neutral molecules in the dataset (which can
be sometimes confused for degenerate inputs). What I think is that your use
case is different than ours, meaning that we have never tried partial
charges + Coulomb. We didn't want to rely on partial charges because of
their 'arbitrary' nature, as opposed to the real physical observable that
total charge is.
Guillem
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOT7MDIF3ZON4LTFB7TYUTEMPAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ2TOMRZGE2Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I'm using Gasteiger charges too. I'll make a PR and you can try it out.
I don't see how total charge can possibly work. It's a global property, but the model is based entirely on local computations. As it's doing computations for local regions of the system, knowing the total charge of the system provides no useful information. It has no idea how much of that charge is in the region it's looking at and how much is elsewhere, so there's nothing it can do with it. Instead it needs information about the local charge distribution, which partial charges give it. Perhaps it could work for tiny molecules where every atom is within the cutoff distance of every other atom, but not for anything larger than that. For comparison, take a look at SpookyNet. It supplements the local computation with some global computation, which allows it to meaningfully use global information. |
Yes, I also think that total charge is going to be hard, and partial
charges are useful, but for what we need, it has the advantage that it is a
physical quantity, it does not depend on the version of RDkit.
…On Tue, Feb 20, 2024 at 5:40 PM Peter Eastman ***@***.***> wrote:
I'm using Gasteiger charges too. I'll make a PR and you can try it out.
We believe total charge is the way to proceed
I don't see how total charge can possibly work. It's a global property,
but the model is based entirely on local computations. As it's doing
computations for local regions of the system, knowing the total charge of
the system provides no useful information. It has no idea how much of that
charge is in the region it's looking at and how much is elsewhere, so
there's nothing it can do with it. Instead it needs information about the
local charge distribution, which partial charges give it. Perhaps it could
work for tiny molecules where every atom is within the cutoff distance of
every other atom, but not for anything larger than that.
For comparison, take a look at SpookyNet. It supplements the local
computation with some global computation, which allows it to meaningfully
use global information.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOQ5DPFZDSYJLVFZXOLYUTGYJAVCNFSM47GPOV22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJVGQ3DCMZWGY3Q>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Gasteiger charges don't depend on the version of RDKit. The algorithm was published in 1978 and hasn't changed since. |
I want to implement some physics based priors. An example would be a Coulomb interaction based on pre-computed partial charges that are stored in the dataset.
BasePrior.forward()
is supposed to return a list of per-atom energy contributions, but physics based interactions usually do not decompose in that way. It would be much easier if it could just return a total energy for each sample.What do you recommend as the cleanest way of implementing this?
The text was updated successfully, but these errors were encountered: