-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] OpenMP support for LAMMPS ML-PACE #57
Comments
Dear @bernstei , I'm afraid it is not so trivial. The core method is I could imagine two strategies:
What would be your thoughts about it? |
Per-thread arrays sounds simpler, as long as they're not so big that their memory usage becomes an issue. I guess it'd also depend on exactly how those global arrays are used - are they just filled in once at the beginning and then used for each atom? Are there contributions from each atom that have to be gathered somehow, either easy if they involve writing to different array locations, or harder if contributions from different atoms have to be summed. I can look at how those global arrays are used and see if I have any specific thoughts. Of those 3 versions of |
These arrays are used for every given current central atom. Logic behind all three evaluators is to work on individual atom at every moment. So, no need to collect something over atoms in in LAMMPS default evaluator is recursive c-tilde, but it can be too complicated. B-evaluator is used for extrapolation grade calculations. Product c-tilde is simpler than recursive ctilde. I would suggest |
Thanks. I'll take a look, probably next week. |
I'm also wondering about adding OpenMP support to the existing kokkos version. I'll investigate that too. [added] I emailed Stan Moore (cc'ing you), for clarification on the existing kokkos implementation and why it's GPU only. |
It would be very useful for me if we could run PACE potentials with LAMMPS OpenMP support, either direct or via kokkos (in general for small systems where domain decomposition is limited, or in my case mainly because I'm parallelizing over a set of small configurations with MPI and having LAMMPS's MPI domain decomposition coexisting would be hard).
I have no idea what that would entail, but I do have some experience coding OpenMP parallelized interatomic potential (although in fortran, not C++, so fewer pointers). If it seems vaguely feasible to anyone who knows the code well, I'd be happy do discuss and try to put something together. The basic idea we've used before is to have each process loop over a subset of the atoms, then accumulate the energies and force contributions. I looked briefly at
pair_pace.cpp
, but since I'm not sure about the internals ofaceimpl
and itsace
attribute, I'm not sure how one would go about sharing them across threads in the correct way (read-only bits public, thread-specific bits private or thread_private).[edited] I just realized that this may be the wrong repo for this feature request, and if so, I apologize. Feel free to tell me, and I'll move it wherever it belongs best.
The text was updated successfully, but these errors were encountered: