Skip to content

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
carstenbauer committed Feb 6, 2024
1 parent dbc14b0 commit 894d800
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 9 deletions.
6 changes: 4 additions & 2 deletions docs/src/examples/tls/tls.jl
Original file line number Diff line number Diff line change
Expand Up @@ -155,11 +155,13 @@ res ≈ res_manual

using BenchmarkTools

@show nthreads()

@btime matmulsums($As, $Bs);
@btime matmulsums_naive($As, $Bs);
@btime matmulsums_tls($As, $Bs);
@btime matmulsums_manual($As, $Bs);

# As we see, the recommened version `matmulsums_tls` is both convenient as well as
# efficient: It allocates much less memory than `matmulsums_naive` and only slightly
# more than the manual implementation.
# efficient: It allocates much less memory than `matmulsums_naive` (5 vs 64 times 8 MiB)
# and is very much comparable to the manual implementation.
18 changes: 11 additions & 7 deletions docs/src/examples/tls/tls.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ res = matmulsums(As, Bs);

The key idea for creating a parallel version of `matmulsums` is to replace the `map` by
OhMyThreads' parallel [`tmap`](@ref) function. However, because we re-use `C`, this isn't
entirely trivial. Someone new to parallel computing might be tempted to parallelize `matmulsums` like so:
entirely trivial. Someone new to parallel computing might be tempted to parallelize
`matmulsums` like so:

````julia
using OhMyThreads: tmap
Expand Down Expand Up @@ -208,23 +209,26 @@ compare the performance of the variants discussed above.
````julia
using BenchmarkTools

@show nthreads()

@btime matmulsums($As, $Bs);
@btime matmulsums_naive($As, $Bs);
@btime matmulsums_tls($As, $Bs);
@btime matmulsums_manual($As, $Bs);
````

````
2.916 s (3 allocations: 8.00 MiB)
597.915 ms (174 allocations: 512.01 MiB)
575.507 ms (67 allocations: 40.01 MiB)
572.501 ms (49 allocations: 40.00 MiB)
nthreads() = 5
2.903 s (3 allocations: 8.00 MiB)
582.991 ms (174 allocations: 512.01 MiB)
576.002 ms (67 allocations: 40.01 MiB)
575.374 ms (49 allocations: 40.00 MiB)
````

As we see, the recommened version `matmulsums_tls` is both convenient as well as
efficient: It allocates much less memory than `matmulsums_naive` and only slightly
more than the manual implementation.
efficient: It allocates much less memory than `matmulsums_naive` (5 vs 64 times 8 MiB)
and is very much comparable to the manual implementation.

---

Expand Down

0 comments on commit 894d800

Please sign in to comment.