Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] OpenMP support for LAMMPS ML-PACE #57

Open
bernstei opened this issue Sep 23, 2023 · 5 comments
Open

[feature request] OpenMP support for LAMMPS ML-PACE #57

bernstei opened this issue Sep 23, 2023 · 5 comments

Comments

@bernstei
Copy link

bernstei commented Sep 23, 2023

It would be very useful for me if we could run PACE potentials with LAMMPS OpenMP support, either direct or via kokkos (in general for small systems where domain decomposition is limited, or in my case mainly because I'm parallelizing over a set of small configurations with MPI and having LAMMPS's MPI domain decomposition coexisting would be hard).

I have no idea what that would entail, but I do have some experience coding OpenMP parallelized interatomic potential (although in fortran, not C++, so fewer pointers). If it seems vaguely feasible to anyone who knows the code well, I'd be happy do discuss and try to put something together. The basic idea we've used before is to have each process loop over a subset of the atoms, then accumulate the energies and force contributions. I looked briefly at pair_pace.cpp, but since I'm not sure about the internals of aceimpl and its ace attribute, I'm not sure how one would go about sharing them across threads in the correct way (read-only bits public, thread-specific bits private or thread_private).

[edited] I just realized that this may be the wrong repo for this feature request, and if so, I apologize. Feel free to tell me, and I'll move it wherever it belongs best.

@yury-lysogorskiy
Copy link
Member

Dear @bernstei , I'm afraid it is not so trivial. The core method is compute_atom (here or here or here). It uses these global arrays, defined on the class level and initialized once. So, having openmp parallel-for over different atoms i in pair_pace.cpp::compute will result in a mess.

I could imagine two strategies:

  1. Push parallel-for down to compute_atom's inner loops over basis functions, neighbours, etc.
  2. Have per-thread "global" arrays.

What would be your thoughts about it?

@bernstei
Copy link
Author

bernstei commented Oct 5, 2023

Per-thread arrays sounds simpler, as long as they're not so big that their memory usage becomes an issue. I guess it'd also depend on exactly how those global arrays are used - are they just filled in once at the beginning and then used for each atom? Are there contributions from each atom that have to be gathered somehow, either easy if they involve writing to different array locations, or harder if contributions from different atoms have to be summed. I can look at how those global arrays are used and see if I have any specific thoughts.

Of those 3 versions of compute_atom, which do you think is simplest (if there's a difference), for me to start with?

@yury-lysogorskiy
Copy link
Member

yury-lysogorskiy commented Oct 5, 2023

These arrays are used for every given current central atom. Logic behind all three evaluators is to work on individual atom at every moment. So, no need to collect something over atoms in compute_atom. Only "pair-forces" are accumulated in pair_pace::compute across atoms.

in LAMMPS default evaluator is recursive c-tilde, but it can be too complicated. B-evaluator is used for extrapolation grade calculations. Product c-tilde is simpler than recursive ctilde.

I would suggest product c-tilde as starting point. Stan Moore also used it as a basis for GPU-KOKKOS implementation, as simpler one (and more GPU friendly).

@bernstei
Copy link
Author

bernstei commented Oct 5, 2023

Thanks. I'll take a look, probably next week.

@bernstei
Copy link
Author

bernstei commented Oct 5, 2023

I'm also wondering about adding OpenMP support to the existing kokkos version. I'll investigate that too.

[added] I emailed Stan Moore (cc'ing you), for clarification on the existing kokkos implementation and why it's GPU only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants