pp00aa dynamic OpenMP scheduling improves load imbalance #207
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since poincare tracing is done using an adaptive integration routine, execution time isn't equal between points.
Especially chaotic regions take the longer to integrate and are typically not equally distributed along the nptrj range, so this results in a significant load imbalance between threads. The default static scheduling divides the workloads in large, equal blocks between threads. Dynamic scheduling creates a more fine grained work distribution (round robin, 1 loop iteration per thread) and uses whichever threads are available at the moment.
This might cause a minor performance overhead in the edge case of very small
nppts
, with low and largenptrj
, but this should be outweighed by the improved load balance. It improved wall clock time and CPU utilization in all my tested examples.E.g. for a simple rotating ellipse with
this resulted in a speedup of 34s (1m25s to 51s) on 4 threads, and a speedup was measurable even in the "worst case" of
nppts = 1