Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand for setting number of tasks #76

Closed
mkoculak opened this issue Mar 11, 2024 · 7 comments · Fixed by #81
Closed

Shorthand for setting number of tasks #76

mkoculak opened this issue Mar 11, 2024 · 7 comments · Fixed by #81
Labels
enhancement New feature or request idea

Comments

@mkoculak
Copy link

Hi,
I really like the macro addition and was wondering if it would make sense for You to include a shorthand for number of spawned tasks?
So having e.g. @set chunks=8 instead of @set scheduler = DynamicScheduler(; nchunks=8)?
It feels like an option that users would frequently want to change and this would make the code more clear and readable.

On a separate note, not sure how useful it would be in general, but I found myself frequently testing longer pipelines with many functions that have multithreading and having a global setting that changes them all at once makes the work much easier.
I have implemented an "config" struct in my packages that gets queried about parameters to run the code with, but maybe it would be useful to have something similar already here?

@carstenbauer
Copy link
Member

carstenbauer commented Mar 11, 2024

So having e.g. @set chunks=8 instead of @set scheduler = DynamicScheduler(; nchunks=8)?

Would you want a shortcut for setting the number of tasks or the number of chunks? Note that for DynamicScheduler, that's the same thing but for, say, GreedyScheduler it isn't. (In fact, as of now, GreedyScheduler doesn't even have nchunks but only ntasks.)

Assuming you want a shortcut for the number of tasks, let me first say that I've also had this thought before. Given that we now call the loop macro @tasks it doesn't seem unnatural to have a ntasks option. However, at the time, I couldn't make up my mind whether and how we should add this.

What's your reason for requesting it? Are you trying to set ntasks<nthread() to, effectively, control the number of threads used? Or are you trying to set ntasks>>nthreads() to tune for load balancing?
(Note that with #60, I proposed a (kind of) alternative for the latter.)

As for your other idea, at least for now I don't plan to add a "global config" feature. A workaround is to just to instantiate a "global" Scheduler and use it in all @tasks blocks.

@carstenbauer carstenbauer added enhancement New feature or request idea labels Mar 11, 2024
@mkoculak
Copy link
Author

Right, I had ntasks in mind.
In my particular use case I rather go for ntasks < thread() - my workers almost always do identical work, so no need for load balancing. So I usually want control over the number of threads - mostly to test if multithreading works as expected or to find the optimum number above which adding more threads gives no benefit.

But yeah, as you point out, there might not be a single nthreads-like abstraction that could be used with all schedulers that you offer.

@carstenbauer
Copy link
Member

carstenbauer commented Mar 12, 2024

Assuming that we make the following work

tmapreduce(sin, +, 1:10; ntasks=2) # meaning DynamicScheduler(; nchunks=2)

what should happen if both ntasks and scheduler are provided?

What would we expect to happen in the following cases?

tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler())

Should this amount to StaticScheduler(; nchunks=2)? Or DynamicScheduler(; nchunks=2) (ignoring scheduler entirely)? Or throw an error?

tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler(; nchunks=5))

Should this amount to StaticScheduler(; nchunks=2)? Or DynamicScheduler(; nchunks=2) (ignoring scheduler entirely)? Or throw an error?

tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler(; nchunks=2))

Should this amount to StaticScheduler(; nchunks=2)? Or DynamicScheduler(; nchunks=2) (ignoring scheduler entirely)? Or throw an error?

(The macro API should behave in the same way of course.)

@carstenbauer
Copy link
Member

carstenbauer commented Mar 12, 2024

My first naive thought was to respect the kind of scheduling (static, dynamic, ...) when scheduler is provided but simply ignore all the scheduler options. I.e.

julia> tmapreduce(sin, +, 1:10; ntasks=2) # meaning DynamicScheduler(; nchunks=2)
_scheduler = DynamicScheduler{OhMyThreads.Schedulers.FixedCount}(:default, 2, -1, :batch)
1.4111883712180107

julia> tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler())
_scheduler = StaticScheduler{OhMyThreads.Schedulers.FixedCount}(2, -1, :batch)
1.4111883712180107

julia> tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler(; nchunks=5))
_scheduler = StaticScheduler{OhMyThreads.Schedulers.FixedCount}(2, -1, :batch)
1.4111883712180107

julia> tmapreduce(sin, +, 1:10; ntasks=2, scheduler=StaticScheduler(; nchunks=2))
_scheduler = StaticScheduler{OhMyThreads.Schedulers.FixedCount}(2, -1, :batch)
1.4111883712180107

The reasoning was to allow switching between dynamic and, say, static scheduling and avoid adding Symbols to the mix, e.g. tmapreduce(sin, +, 1:10; ntasks=2, scheduler=:static. But maybe that's a cleaner API (error when we get ntasks + scheduler isa Scheduler and work if we get scheduler isa (valid) Symbol)?

@MasonProtter
Copy link
Member

My first naive thought was to respect the kind of scheduling (static, dynamic, ...) when scheduler is provided but simply ignore all the scheduler options. I.e.

I think I like this option, but it's hard to say. I could also imagine making it go the other way such that scheduler overrides the kwargs.

But maybe that's a cleaner API (error when we get ntasks + scheduler isa Scheduler and work if we get scheduler isa (valid) Symbol)?

I think this one is maybe best, that way there's no guessing about which one takes priority.

@carstenbauer
Copy link
Member

carstenbauer commented Mar 12, 2024

FYI: I made a (rather big) PR (#81) where I implement the no guessing strategy from above. In the PR, I don't special case ntasks but forward (almost) all keyword arguments to the scheduler constructor.

@mkoculak
Copy link
Author

PR looks great, will definitely make it work for my use case, and I agree that no-guessing strategy won't make anyone upset but will prevent mistakes.
I think it's safe to close this issue now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request idea
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants