-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add_compute_plan
- which batch size to use
#282
Comments
@AurelienGasser you said that
Rule of thumb - optimal batch sizeIf the max value is always the same, then, from the error shown in the description,
so, in this formula:
so is it correct to say that |
There is no "one size fits all" batch size. It depends on both the number of tasks and the number of inputs. Would it be feasible to catch this error at the SDK level and lower batch size before retry? |
But if we can have a rule of thumb on which batch size works, we can indicate to the user what value to use.
Sure, we can try that, I would do that on top of documenting the "optimal batch size" We should also expose the batch size in substrafl, today only the |
Do we have data on how much slower it is to have a small batch size vs a big batch size? |
@tanguy-marchand from what you said, 15 rounds, 136 tuples, with 1257 data samples took 5min to submit? (1257 samples per task or in total? The number that we are interested in is the number of samples per task) |
It's always the same. We could change it but have chosen no to so far. The limit serves the purpose of limiting the load on the server and avoid resource starvation. |
A CP using 2 centers (with respectively 1257 and 999 samples) and 30 rounds (overall 512 tuples) takes 17 minutes to submit |
OK, so as a first fix what we can do is:
then discuss a better solution:
I think the best would be if we are able to calculate it but I am worried it would slow down the execution and we'll want to keep being able to override this if for any reason the calculation is wrong. |
Summary
When we add a compute plan with N tasks, we can set
autobatching
to True and set the batch size.This submits the tasks to the backend by batches of size
batch_size
. The fastest option is to increase the batch size as much as possible without getting backend errors.The default batch size is 500, the question here is: how to find the maximal batch size we can use?
What happens when the batch size is too big
When the batch size is too big (451 tasks * 400 data samples per task), we get the following error
The text was updated successfully, but these errors were encountered: