Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section about false sharing problems to documentation #1819

Merged
merged 5 commits into from
Feb 5, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/src/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,3 +267,14 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension,
timing result, you need to set the analysis interval such that the
`AnalysisCallback` is invoked at least once during the course of the simulation and
discard the first PID value.

## Performance issues with multi-threaded reductions
[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a known performance issue
for systems with distributed caches. It also occurred for the implementation of a thread
parallel bounds checking routine for the subcell IDP limiting
in [PR #1736](https://github.com/trixi-framework/Trixi.jl/pull/1736).
After some [testing and discussion](https://github.com/trixi-framework/Trixi.jl/pull/1736#discussion_r1423881895),
it turned out that initializing a vector of length `n * Threads.nthreads()` and only using every
n-th entry instead of a vector of length `Threads.nthreads()` fixes the problem.
Since there are no processors with caches over 128B, we use `n = 128B / size(uEltype)`.
Now, the bounds checking routine of the IDP limiting scales as hoped.
Loading