-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize wholemolecules #1045
base: master
Are you sure you want to change the base?
Optimize wholemolecules #1045
Conversation
speedup 3%, but code is a mess
src/core/ActionWithValue.cpp
Outdated
// for(unsigned j=0; j<forcesForApply.size(); ++j) { | ||
// forcesForApply[j]+=omp_f[j]; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
src/core/ActionWithValue.cpp
Outdated
// if( nt>1 ) for(unsigned j=0; j<nder; ++j) omp_f[j] += ff*thisderiv[1+j]; | ||
//else for(unsigned j=0; j<nder; ++j) forcesForApply[j] += ff*thisderiv[1+j]; |
Check notice
Code scanning / CodeQL
Commented-out code Note
src/core/ActionAtomistic.cpp
Outdated
// for(const auto & a : atom_value_ind) { | ||
// plumed_dbg_massert( ind<forcesToApply.size(), "problem setting forces in " + getLabel() ); | ||
// std::size_t nn = a.first, kk = a.second; | ||
// xpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// ypos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// zpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
src/core/ActionAtomistic.cpp
Outdated
// for(const auto & a : atom_value_ind) { | ||
// std::size_t nn = a.first, kk = a.second; | ||
// positions[j][0] = xpos[nn]->data[kk]; | ||
// positions[j][1] = ypos[nn]->data[kk]; | ||
// positions[j][2] = zpos[nn]->data[kk]; | ||
// charges[j] = chargev[nn]->data[kk]; | ||
// masses[j] = masv[nn]->data[kk]; | ||
// j++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
src/core/ActionWithVirtualAtom.cpp
Outdated
// for(const auto & a : atom_value_ind) { | ||
// std::size_t nn = a.first, kk = a.second; | ||
// xpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// ypos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// zpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
src/core/ActionAtomistic.cpp
Outdated
atom_value_ind_grouped.back().second.push_back(kk); | ||
auto prev_nn=nn; | ||
for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
auto nn = atom_value_ind[i].first; |
Check notice
Code scanning / CodeQL
Declaration hides variable Note
line 122
src/core/ActionAtomistic.cpp
Outdated
auto prev_nn=nn; | ||
for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
auto nn = atom_value_ind[i].first; | ||
auto kk = atom_value_ind[i].second; |
Check notice
Code scanning / CodeQL
Declaration hides variable Note
line 123
// } else { | ||
// for(unsigned j=1; j<p_groups[i].size(); ++j) { | ||
// Vector first=getGlobalPosition(p_roots[i][j-1]); | ||
// Vector second=getGlobalPosition(p_groups[i][j]); | ||
// second=first+pbcDistance(first,second); | ||
// setGlobalPosition(p_groups[i][j], second ); | ||
// } | ||
// } |
Check notice
Code scanning / CodeQL
Commented-out code Note
// if(addref) { | ||
// first = refs[i]+pbcDistance(refs[i],first); | ||
// setGlobalPosition( p_groups[i][0], first ); | ||
// } | ||
// if(!doemst) { |
Check notice
Code scanning / CodeQL
Commented-out code Note
And here results for the intel compiler on my workstation. Reference is current master (just pulled), then I time master + this optimization of wholemolecules. My input: 28% -> 26% overhead Carlo's input: no measurable overhead in both cases (<1%)
|
Hello @GiovanniBussi What you have done here seems sensible. I don't think I can do it better. |
@gtribello I really don't like the way it's done, because it's intrusive in what's supposed to be "user code" (wholemolecules), with modifications that are difficult to understand. I also want to repeat the timings, because there is an interplay between all the optimizations we are doing. This one might be not so relevant, so I would keep it on hold |
@gtribello here I tried a trick similar to #1044 to optimize wholemolecules. Notice that I didn't tough the EMST stuff, so tests are expected to fail. Anyway, tests with simple wholemolecules should work.
On my usual input I get some further speedup, with overhead going from 13% to 10%.
On @carlocamilloni 's input see here it's even better. Before this commit, I get same performance as v2.9. After this commit I gain something like 5%.
Maybe @gtribello you want to have a look at this code and think if there's some other reasonable (and simpler) solution. Notice that these PR are all stacked onto each other, so you should only look at the last commit (but the performance is measured including the previous PRs)
With my input:
With Carlo's input: