You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looks like job-controller is running after having an error (it has "Running" status). This is actually a workflow-engine pod that is finished causing a "Completed" or "NotReady" state.
My assumption is workflow-engine sends a message to the job-status queue at the end, but job-status-consumer fails to process it. We have examples where quota update fails (related reanahub/reana-commons#303), it might be the cause of a few "NotReady" batch pods.
Seen with REANA 0.8.
A workflow finished fine at time T1, meaning that the workspace was properly populated, and the reana-run-batch workflow-engine logs say:
However, the job-controller logs say there was an error:
Due to this, the reana-run-batch pod stays in
NotReady
state and the DB sees it as still "running".Such a "stuck" workflow leads to scheduling troubles.
We should get the job monitor ready to handle this exception.
Note also the huge time delay between workflow termination (20:36:01) and pod deletion event (21:28:26).
The text was updated successfully, but these errors were encountered: