Supervisor: Make ZeroMQ socket timeout configurable, and/or increase default timeout #5620
Labels
area:kallichore
Issues related to the new kernel supervisor
area: workbench
Issues related to Workbench category.
Milestone
Currently, the kernel supervisor waits up to 20 seconds for ZeroMQ socket connections. If it is not able to connect to the kernel after 20 seconds, it shows an error like this one:
20 seconds is a long time to wait for startup, but some systems are pretty slow, and in our own environments (which aren't necessarily the slowest of any we would support) we've observed start times of > 15s even when everything is working correctly. See e.g. traces in #5340.
It would be great if there were some way for us to know whether the kernel was working correctly (and slowly) or legitimately hung when we are waiting for a socket connection. However, since the sockets are how we talk to the kernel in the first place, we would need to establish some sort of side channel or heuristic to figure this out.
We should, at a minimum:
The text was updated successfully, but these errors were encountered: