-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WithTaskLocals
to make the handling of task local values more efficient.
#63
Conversation
One thing we can (and should eventually do) is to apply the optimizations we do for ScopedValue to TaskLocalValue. The compiler folds repeated accesses. |
Would that be possible in a package without compiler changes? |
This looks good to me. Assuming that we can't easily get this as an optimization in the near future, I'm fine with moving on with this. To be sure, this doesn't negatively affect the performance if no TLVs are used (because To your questions (in the OP):
I'd say no. We don't even export
Maybe. I don't have a strong opinion here and would leave this decision to Valentin.
I think the API is fine. I also think the name is fine but, since we are pretty verbose here anyways, we could also make it |
yeah, that's right they're both identity ops if no TLVs are used OhMyThreads.jl/src/implementation.jl Lines 201 to 219 in 717799f
I actually originally named it that, but it was so annoying to type that I shortened it. |
WithTaskLocalValues
to make the handling of task local values more efficient.WithTaskLocals
to make the handling of task local values more efficient.
Okay, this should be ready to go so long as you approve of the changes I made to the task local storage docs @carstenbauer |
I'll take a look later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the minor formatting bug, LGTM.
What?
This gives a way to make it so that you don't have to access the
TaskLocalValue
every look iteration. Rather, you only access theTaskLocalValue
after we@spawn
a task, i.e. basically it should match the way that you would manually usetask_local_storage
.For things where each iteration takes a long time, this is pretty insignificant, since accessing task local storage only takes around 10ns, but for fast functions this can be really important.
E.g. suppose we have a thread local accumulator we use in a sum:
Now, one can fairly say "Well, you shouldn't have re-accessed
acc
so many times in the body ofsum_accumulator
!" Which is true, but even then,sum_accumulator_macro
is significantly fasterAnd here's what these would look like if we do a
map
instead:What about a less cherry picked example?
This gain becomes more and more mild, the longer the function call takes. E.g. if I re-run the benchmarks in https://juliafolds2.github.io/OhMyThreads.jl/v0.4/literate/tls/tls/#Benchmark, I find
so a very small difference, though it puts things more in line with the manual case, as expected, though what I find unexpected is that there's more allocations in this case, I'm not sure what exactly that's about, but I think it has to do with the
map
rewarpping step I had to do.Observe that turning these into a reduction, we get:
which has fewer allocations.
How does this work?
The key here is a new closure type called
WithTaskLocals
(bikeshedding welcome). Essentially, if you writethen that creates a closure object capturing the
TaskLocalValues
which is equivalent tohowever, the main difference is that you can call
promise_task_local
on aWithTaskLocals
closure in order to turn it into something equivalent towhich doesn't have the overhead of accessing the
task_local_storage
each time the closure is called. This of course will lose the safety advantages ofTaskLocalValue
, so you should never callf_local = promise_task_local(f)
and then passf_local
to some unknown function, because if that unknown function callsf_local
on a new thread, you'll hit a race condition.However, we can take advantage of the structure of
mapreduce
calls to build upWithTaskLocals
objects from the@tasks
macro and then pass them totmapreduce
andtmap
and then basically do@spawn promise_task_local(f)(args...)
because we know at that point thatf
is actually being called so it's safe to make the promise.Can we get these performance advantages without using the
@tasks
macro?You betcha, but it's kinda ugly:
Todo
WithTaskLocals
be exported?WithTaskLocals
be upstreamed to TaskLocalValues.jl?