Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster force #675

Open
wants to merge 7 commits into
base: refactor_data
Choose a base branch
from
Open

Conversation

LarsSchaaf
Copy link
Collaborator

Fitting to cluster forces for better intermolecular interactions.

Description

Atoms are assumed to be separated into clusters, eg molecules. Each cluster has a unique id (which is passed to cluster_key). The cluster force is all forces belonging to a cluster summed together. The additional loss term fits the predicted to ground truth cluster force.

Possible extensions:

  • Fit to torque

@gabor1
Copy link
Collaborator

gabor1 commented Nov 6, 2024

Is there a script to process an xyz to find clusters and annotate the file?

@LarsSchaaf
Copy link
Collaborator Author

Will add annotation script from Ioan.

Another improvement to current implementation:

  • at the moment the cluster ids have to be unique for the entire training/valid/test set.
  • could use double batch ID like Will does for kpoints
  • short term solution: check they are unique and throw error if not?

@gabor1
Copy link
Collaborator

gabor1 commented Nov 6, 2024

Make sure the script you add generates unique IDs, and then when reading the file you can check it. Why does the ID need to be unique across the data set rather than just within the config? You could generate a unique id by concatenating the config ID and the cluster ID and this would then be unique irrespective of the cluster ID uniqueness

@LarsSchaaf
Copy link
Collaborator Author

Yes exactly, thats the to do ("double batched"). I'll edit it tomorrow.

@ilyes319
Copy link
Contributor

ilyes319 commented Nov 6, 2024

Looks good. Could you add a small test for the loss + one in the run_train test?

@LarsSchaaf
Copy link
Collaborator Author

The clustering scripts are taken from Ioan Magdau's repo: https://github.com/imagdau/aseMolec/blob/main/aseMolec/anaAtoms.py

@ilyes319
Copy link
Contributor

ilyes319 commented Dec 5, 2024

@LarsSchaaf I think it would be nice to add somme config_cluster_weight, inside the loss, as you already added them to the AtomicData.

  configs_weight = torch.repeat_interleave(
        ref.cluster_weight, ref.ptr[1:] - ref.ptr[:-1]
    ).unsqueeze(
        -1
    )

And then do a bunch of scatter etc to create a mask per cluster and mask out clusters that should not be weighted.

@gabor1
Copy link
Collaborator

gabor1 commented Dec 5, 2024

I don't know if it's currently possible or not, but it would be nice if not all atoms needed to be part of a cluster, just some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants