You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Celery unfortunately starts the worker subprocesses as 'daemons' which prevents the worker process of 'optimizer-worker' from creating any other subprocesses. Therefore, we acknowledge this process is a daemon and turn of the protectioon that prevents new subprocesses from being created. This does introduce the issue that if the task is cancelled/revoked, the subprocess created by the worker subprocess will continue as a zombie process until it completes.
This is necessary as Casadi does not release the GIL which starves all other threads. We need to isolate mesido/Casadi to its own process.
Many alternatives were tried:
Use threading as a task pool instead of subprocess in Celery worker. Threading does not support 'terminate_job' and therefore we cannot cancel a task. Other task pool types also do not support 'terminate_job' nor provide the required isolation.
Use subprocess.run(). While this does not throw an error if ran from a daemon process, it still causes the same issue. Also, exceptions are no longer propagated so we need to ensure an exit_code != throws an error. This is quite convoluted.
Use python -O. This turns off asserts (including the ones in Mesido!!) and therefore ignores the error. However, the issue remains so not a real solution.
Investigated alternatives to Celery. Dramatiq does not support cancellation of tasks. Huey does not support AMQP.
Extend Celery ThreadingTaskPool to support terminate_job by asking the underlying thread nicely to stop as soon as it can. Also, we would switch over to 'threading' as the task pool. Will require some work as Celery is quite complex.
Find an alternative for Celery or build our own. --> Will not resolve issue of allowing subprocesses in subprocesses, but may give the freedom to design the worker in such a way that only a single subprocess (to isolate Casadi) is needed.
The text was updated successfully, but these errors were encountered:
We landed on hooking into the way Celery cancels the forked worker process. It throws a SystemExit with code -241. By listening for the SystemExit, we can terminate the multiprocessing.Pool with Mesido worker before the Celery worker waits for the forked worker process to terminate. This is still a workaround as Casadi still does not release the GIL. This is now captured in: #57
Celery unfortunately starts the worker subprocesses as 'daemons' which prevents the worker process of 'optimizer-worker' from creating any other subprocesses. Therefore, we acknowledge this process is a daemon and turn of the protectioon that prevents new subprocesses from being created. This does introduce the issue that if the task is cancelled/revoked, the subprocess created by the worker subprocess will continue as a zombie process until it completes.
This is necessary as Casadi does not release the GIL which starves all other threads. We need to isolate mesido/Casadi to its own process.
Many alternatives were tried:
subprocess.run()
. While this does not throw an error if ran from a daemon process, it still causes the same issue. Also, exceptions are no longer propagated so we need to ensure an exit_code != throws an error. This is quite convoluted.python -O
. This turns off asserts (including the ones in Mesido!!) and therefore ignores the error. However, the issue remains so not a real solution.Remaining alternatives:
terminate_job
by asking the underlying thread nicely to stop as soon as it can. Also, we would switch over to 'threading' as the task pool. Will require some work as Celery is quite complex.The text was updated successfully, but these errors were encountered: