Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to not provide return type to map/collect #6

Merged
merged 6 commits into from
Jan 29, 2024
Merged

Conversation

MasonProtter
Copy link
Member

This is an alternative to #2

Basic idea is just that if the user doesn't want to specify the output type of tmap, then we just do a sequential map on each task, and then join those results together using BangBang.append!!.

This is type stable and inference friendly, though it does result in more allocations and is generally a bit slower.

@MasonProtter
Copy link
Member Author

Here's some numbers for an idea of the performance differences:

julia> for N  (100, 1000, 5000)
           @show N
           A = (1:N) * (1:N)'
           print(" map(sin, A)         "); @btime map(sin, $A)
           print("tmap(sin, A)         "); @btime tmap(sin, $A, nchunks=Threads.nthreads())
           print("tmap(sin, Float64, A)"); @btime tmap(sin, Float64, $A, nchunks=Threads.nthreads())
           println("------------------------------------------")
       end
N = 100
 map(sin, A)           51.217 μs (2 allocations: 78.17 KiB)
tmap(sin, A)           30.988 μs (65 allocations: 383.64 KiB)
tmap(sin, Float64, A)  11.021 μs (45 allocations: 81.86 KiB)
------------------------------------------
N = 1000
 map(sin, A)           7.982 ms (2 allocations: 7.63 MiB)
tmap(sin, A)           2.409 ms (73 allocations: 15.83 MiB)
tmap(sin, Float64, A)  1.432 ms (45 allocations: 7.63 MiB)
------------------------------------------
N = 5000
 map(sin, A)           392.308 ms (2 allocations: 190.73 MiB)
tmap(sin, A)           140.670 ms (74 allocations: 425.98 MiB)
tmap(sin, Float64, A)  79.505 ms (45 allocations: 190.74 MiB)

Generally speaking, the difference is most pronounced if the mapping function (here it's sin) is fast. If we do something slower, we see closer results:

julia> for N  (100, 1000, 5000)
           @show N
           A = (1:N) * (1:N)'
           f = expsincossincos
           print(" map(f, A)         "); @btime  map($f, $A)
           print("tmap(f, A)         "); @btime tmap($f, $A, nchunks=Threads.nthreads())
           print("tmap(f, Float64, A)"); @btime tmap($f, Float64, $A, nchunks=Threads.nthreads())
           println("------------------------------------------")
       end
N = 100
 map(f, A)           278.828 μs (2 allocations: 78.17 KiB)
tmap(f, A)           58.230 μs (65 allocations: 383.64 KiB)
tmap(f, Float64, A)  48.793 μs (45 allocations: 81.86 KiB)
------------------------------------------
N = 1000
 map(f, A)           29.836 ms (2 allocations: 7.63 MiB)
tmap(f, A)           8.271 ms (73 allocations: 15.83 MiB)
tmap(f, Float64, A)  6.957 ms (45 allocations: 7.63 MiB)
------------------------------------------
N = 5000
 map(f, A)           1.099 s (2 allocations: 190.73 MiB)
tmap(f, A)           271.959 ms (74 allocations: 425.98 MiB)
tmap(f, Float64, A)  204.507 ms (45 allocations: 190.74 MiB)

@MasonProtter
Copy link
Member Author

This is actually a quite good example of a usecase where we would want to do a tree-based reduction, since the reducer append!! here takes a long time.

@MasonProtter
Copy link
Member Author

I'm going to merge this for now, and we can revisit later how exactly the reduction is performed

@MasonProtter MasonProtter merged commit 7aa05e8 into master Jan 29, 2024
8 checks passed
@MasonProtter MasonProtter deleted the remap branch January 29, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant