Skip to content

Commit

Permalink
add bumper to tls doc page
Browse files Browse the repository at this point in the history
  • Loading branch information
carstenbauer committed Feb 29, 2024
1 parent f860665 commit d9fa194
Show file tree
Hide file tree
Showing 3 changed files with 77 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/src/literate/tls/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Bumper = "8ce10254-0962-460f-a3d8-1f77fea1446e"
OhMyThreads = "67456a42-1dca-4109-a031-0a68de7e3ad5"
StrideArrays = "d1fa6d79-ef01-42a6-86c9-f7c551f8593b"
ThreadPinning = "811555cd-349b-4f26-b7bc-1f208b848042"
34 changes: 34 additions & 0 deletions docs/src/literate/tls/tls.jl
Original file line number Diff line number Diff line change
Expand Up @@ -394,3 +394,37 @@ sort(res_nu) ≈ sort(res_channel_flipped)
@btime matmulsums_perthread_channel_flipped($As_nu, $Bs_nu);
@btime matmulsums_perthread_channel_flipped($As_nu, $Bs_nu; ntasks = 2 * nthreads());
@btime matmulsums_perthread_channel_flipped($As_nu, $Bs_nu; ntasks = 10 * nthreads());

# ## Bumper.jl (only for the brave)
#
# If you are bold and want to cut down temporary allocations even more you can
# give [Bumper.jl](https://github.com/MasonProtter/Bumper.jl) a try. Essentially, it
# allows you to *bring your own stacks*, that is, task-local bump allocators which you can
# dynamically allocate memory to, and reset them at the end of a code block, just like
# Julia's stack.
# Be warned though that Bumper.jl is (1) a rather young package with (likely) some bugs
# and (2) can easily lead to segfaults when used incorrectly. It can make sense to use it
# though if you can live with the risk and really can't avoid allocating many (many) times
# on each parallel task. For our example, this isn't the case but let's nonetheless how one
# would use Bumper.jl here.

using Bumper
using StrideArrays # makes things a little bit faster

function matmulsums_bumper(As, Bs)
N = size(first(As), 1)
tmap(As, Bs) do A, B
@no_escape begin # promising that no memory will escape
C = @alloc(Float64, N, N) # from bump allocater (fake "stack")
mul!(C, A, B)
sum(C)
end
end
end

res_bumper = matmulsums_bumper(As, Bs);
res res_bumper

@btime matmulsums_bumper($As, $Bs);

# Compare this, especially the total allocated memory, to the variants above.
41 changes: 41 additions & 0 deletions docs/src/literate/tls/tls.md
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,47 @@ Quick benchmark:
````

## Bumper.jl (only for the brave)

If you are bold and want to cut down temporary allocations even more you can
give [Bumper.jl](https://github.com/MasonProtter/Bumper.jl) a try. Essentially, it
allows you to *bring your own stacks*, that is, task-local bump allocators which you can
dynamically allocate memory to, and reset them at the end of a code block, just like
Julia's stack.
Be warned though that Bumper.jl is (1) a rather young package with (likely) some bugs
and (2) can easily lead to segfaults when used incorrectly. It can make sense to use it
though if you can live with the risk and really can't avoid allocating many (many) times
on each parallel task. For our example, this isn't the case but let's nonetheless how one
would use Bumper.jl here.

````julia
using Bumper
using StrideArrays # makes things a little bit faster

function matmulsums_bumper(As, Bs)
N = size(first(As), 1)
tmap(As, Bs) do A, B
@no_escape begin # promising that no memory will escape
C = @alloc(Float64, N, N) # from bump allocater (fake "stack")
mul!(C, A, B)
sum(C)
end
end
end

res_bumper = matmulsums_bumper(As, Bs);
res res_bumper

@btime matmulsums_bumper($As, $Bs);
````

````
786.991 ms (275 allocations: 34.50 KiB)
````

Compare this, especially the total allocated memory, to the variants above.

---

*This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*
Expand Down

0 comments on commit d9fa194

Please sign in to comment.