You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the database might get desynchronized with the live system (Kubernetes), we might run into problems when workflows get stuck in running status (as it has happened in the past see reanahub/reana#478). These workflows will count for the maximum concurrent running workflows, potentially blocking the system, queueing all workflows forever (as it has happened in reanahub/reana-commons#250).
Status-report oriented solution
Improve the status report summary to take into account situations in which workflows are stuck in a running state.
Better and more specific stuck workflows detection: current naive implementation doesn't catch the specific nature of stuck workflows:
A stuck workflow might be a very long user workflow; false positive
Include quick actions/commands to fix issues: When the specific nature of a stuck workflow is detected, the summary could include quick actions so admins can run to clean up the cluster (note this could later be used for automatic garbage collection of workflows)
The text was updated successfully, but these errors were encountered:
Background
As per #341 we stop using the Kubernetes API to determine whether the maximum number of concurrent workflows has been reached. Now we use the database value instead.
Problem
Since the database might get desynchronized with the live system (Kubernetes), we might run into problems when workflows get stuck in
running
status (as it has happened in the past see reanahub/reana#478). These workflows will count for the maximum concurrent running workflows, potentially blocking the system, queueing all workflows forever (as it has happened in reanahub/reana-commons#250).Status-report oriented solution
Improve the status report summary to take into account situations in which workflows are stuck in a running state.
The text was updated successfully, but these errors were encountered: