-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Further optimize the optimal rotation code path #560
Comments
Thanks, those seem valid to me.
|
I agree on both points. |
Thanks @jhenin and @giacomofiorin ! I will try the optimizations if I have time. |
@giacomofiorin |
Answering your last point (before the issue gets closed when #585 is merged) parallelization over atoms is desirable, but will take some refactoring. Remember that we are dealing with a broad range of use cases between (1) a single expensive rotation (e.g. large protein) and (2) many rotations/RMSDs with respect to multiple points (e.g. path CV). Also, when going to large numbers of threads good SMP parallelization requires a clever data structure layout (ask those with more experience than either of us). This means yet more refactoring because the current layout is nowhere near clever! :-) Continuous testing has helped tremendously, especially considering that we cannot plan around the release cycles of major packages. I feel that we're getting closer to making that kind of refactoring manageable, but it remains more than one or two PRs away. A practical suggestion: would you be open to consider taking some of the tests that you already wrote in the |
I will try to make some of the tests MD-engine agnostic. Regarding the threading, is it possible to use a thread pool? I think it might be possible to pull a thread from the pool to do either (1) or (2) that you mentioned, but do not know if there are other performance overheads. |
I think the optimal rotation code path can still be optimized further as follows:
S
,S_eigvec
andS_backup
.cvm::matrix2d
allocates the memory dynamically but we already know that the matrices are all 4x4 at compile time;Any ideas?
Update: for 1, it should be 0.5~2 times faster.
The text was updated successfully, but these errors were encountered: