diff --git a/docs/src/performance.md b/docs/src/performance.md
index df66f451b79..82d7f501f63 100644
--- a/docs/src/performance.md
+++ b/docs/src/performance.md
@@ -267,3 +267,14 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension,
     timing result, you need to set the analysis interval such that the
     `AnalysisCallback` is invoked at least once during the course of the simulation and
     discard the first PID value.
+
+## Performance issues with multi-threaded reductions
+[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a known performance issue
+for systems with distributed caches. It also occurred for the implementation of a thread
+parallel bounds checking routine for the subcell IDP limiting
+in [PR #1736](https://github.com/trixi-framework/Trixi.jl/pull/1736).
+After some [testing and discussion](https://github.com/trixi-framework/Trixi.jl/pull/1736#discussion_r1423881895),
+it turned out that initializing a vector of length `n * Threads.nthreads()` and only using every
+n-th entry instead of a vector of length `Threads.nthreads()` fixes the problem.
+Since there are no processors with caches over 128B, we use `n = 128B / size(uEltype)`.
+Now, the bounds checking routine of the IDP limiting scales as hoped.