-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
other than notarization poller, scriptworkers don't respect max run time #720
Comments
I think the SIGTERM after 20-ish minutes is more likely to be from going over the termination grace period in k8s, FWIW. |
It looks like However, I don't see anything in them nor in scriptworker itself that even reads it, let alone tries to respect it... |
Damn... I remember adding support for the notarization_poller in bug 1716793... I guess this didn't come up in other scriptworkers because they tend to be fairly quick anyway and eventually get killed by k8s so nobody ever noticed? |
In fact, no other schemas even mention it! Notarization poller does seem to support it though, even though it's not in its schema: https://github.com/mozilla-releng/scriptworker-scripts/blob/master/notarization_poller/src/notarization_poller/task.py#L65 I'm going to move this issue to scriptworker-scripts, since that seems to be where this sort of thing is implemented. |
scriptworker has a global task_max_timeout setting, and will kill the task if it exceeds that duration (currently 1h). One thing we could do is remove all mentions of maxRunTime and just set task_max_timeout appropriately per pool instead of it being always the same? |
The main case I can think of where we'd want to be able to specify max run time per task is in beetmover (where most beetmover jobs are usually very quick, but push to releases and maybe some maven stuff can be much lengthier). I think most other scriptworker types probably have fairly consistent expected run times across all tasks though. With that said, I'd also be okay just removing |
We've found a few examples of tasks going well over their max run time recently. For example, this task had a max run time of 600 seconds (10 minutes). The first attempt ran for 21 minutes, after which it was killed with signal -15. The second run ran for 16 minutes and completed successfully.
It's possible the first one was killed for going over the max run time -- but even if that's the case, that's far too late.
The text was updated successfully, but these errors were encountered: