User Request: Dead man's switch #377
nikos912000
started this conversation in
Ideas, user requests and proposals
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is your feature request related to a problem? Please describe.
At the moment the "big red button" for single and multiple experiments relies on the operator having access to the cluster.
For example, the operator can run a
kubectl delete Disruption <name>
for one or more disruptions.It would be great if the controller had a dead man's switch. In case connection to the cluster is lost the controller would automatically stop all the running experiments.
Describe the solution you'd like
I think the implementation of a dead man's switch could use a heartbeat and a watchdog timer for remediation.
I'm still not sure how the heartbeat would look like. Can we check if the controller is still up and running and if connection to the cluster is lost?
Describe alternatives you've considered
Introducing support for duration with a default expiry period is a good first step for mitigating the risks. However, it is not enough.
Beta Was this translation helpful? Give feedback.
All reactions