-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to not provide return type to map
/collect
#6
Conversation
Here's some numbers for an idea of the performance differences: julia> for N ∈ (100, 1000, 5000)
@show N
A = (1:N) * (1:N)'
print(" map(sin, A) "); @btime map(sin, $A)
print("tmap(sin, A) "); @btime tmap(sin, $A, nchunks=Threads.nthreads())
print("tmap(sin, Float64, A)"); @btime tmap(sin, Float64, $A, nchunks=Threads.nthreads())
println("------------------------------------------")
end
N = 100
map(sin, A) 51.217 μs (2 allocations: 78.17 KiB)
tmap(sin, A) 30.988 μs (65 allocations: 383.64 KiB)
tmap(sin, Float64, A) 11.021 μs (45 allocations: 81.86 KiB)
------------------------------------------
N = 1000
map(sin, A) 7.982 ms (2 allocations: 7.63 MiB)
tmap(sin, A) 2.409 ms (73 allocations: 15.83 MiB)
tmap(sin, Float64, A) 1.432 ms (45 allocations: 7.63 MiB)
------------------------------------------
N = 5000
map(sin, A) 392.308 ms (2 allocations: 190.73 MiB)
tmap(sin, A) 140.670 ms (74 allocations: 425.98 MiB)
tmap(sin, Float64, A) 79.505 ms (45 allocations: 190.74 MiB) Generally speaking, the difference is most pronounced if the mapping function (here it's julia> for N ∈ (100, 1000, 5000)
@show N
A = (1:N) * (1:N)'
f = exp∘sin∘cos∘sin∘cos
print(" map(f, A) "); @btime map($f, $A)
print("tmap(f, A) "); @btime tmap($f, $A, nchunks=Threads.nthreads())
print("tmap(f, Float64, A)"); @btime tmap($f, Float64, $A, nchunks=Threads.nthreads())
println("------------------------------------------")
end
N = 100
map(f, A) 278.828 μs (2 allocations: 78.17 KiB)
tmap(f, A) 58.230 μs (65 allocations: 383.64 KiB)
tmap(f, Float64, A) 48.793 μs (45 allocations: 81.86 KiB)
------------------------------------------
N = 1000
map(f, A) 29.836 ms (2 allocations: 7.63 MiB)
tmap(f, A) 8.271 ms (73 allocations: 15.83 MiB)
tmap(f, Float64, A) 6.957 ms (45 allocations: 7.63 MiB)
------------------------------------------
N = 5000
map(f, A) 1.099 s (2 allocations: 190.73 MiB)
tmap(f, A) 271.959 ms (74 allocations: 425.98 MiB)
tmap(f, Float64, A) 204.507 ms (45 allocations: 190.74 MiB) |
This is actually a quite good example of a usecase where we would want to do a tree-based reduction, since the reducer |
I'm going to merge this for now, and we can revisit later how exactly the reduction is performed |
This is an alternative to #2
Basic idea is just that if the user doesn't want to specify the output type of
tmap
, then we just do a sequentialmap
on each task, and then join those results together usingBangBang.append!!
.This is type stable and inference friendly, though it does result in more allocations and is generally a bit slower.