Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing the coordination with ArrayFire #1049

Open
Iximiel opened this issue Mar 19, 2024 · 0 comments
Open

Writing the coordination with ArrayFire #1049

Iximiel opened this issue Mar 19, 2024 · 0 comments

Comments

@Iximiel
Copy link
Member

Iximiel commented Mar 19, 2024

I'm not opening a new PR, because I do not think that the work is done, but I think that it is worth showing how we can use ArrayFire for writing the coordination.
You can see the code here.

This is heavily based on what was already done in SAXS and on what I did for the CudaCoordination.
There is some code repetition taken from CudaCoordination that I used to ease the transfer of data.

The main advantage against Cuda is that Plumed already "knows" about ArrayFire within its ./configure, so is easier to start working with AF.

The main difference against plain Cuda is that with Cuda you "do not have" tools: you craft your own. And those tools are optimized for your own problem.
With ArrayFire you have a small toolset optimized for doing tensor (up to 4D) calculations, so you have to adapt your problem in "tensors" (and you have to fully embrace the philosophy "If the only tool you have is a hammer, you tend to see every problem as a nail."[cit.]).

I measured with the current plumed benchmark using calling the actions with cv: *** GROUPA=@mdatoms GROUPB=@mdatoms R_0=1 , using the single precision version for running with my local workstation GPU and using the cuda implementation of ArrayFire. Benchmarks are ran with plumed benchmark --plumed="plumed.dat:cudasingleplumed.dat:firesingleplumed.dat" --natoms=${natoms} --nsteps=1000

This is the raw time of 4 Calculating (forward loop) only:
image

And this is the raw time against the base COORDINATION:
image

I did not manage to get the same performance boost I got with the plain Cuda implementation, but I think that sharing this may be useful as a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant