Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow-run-manager: create new pending status #363

Closed

Comments

@diegodelemos
Copy link
Member

diegodelemos commented Feb 23, 2021

Issue

A general problem in REANA regarding workflows stuck in running state was fixed in reanahub/reana#478. However, there is still a corner case: there is no way to avoid stuck running workflows if workflow-engine pod never starts. This is also a UX issue because the user gets a confusing message, workflow status running, when in reality this is not what is happening, the workflow is pending to be scheduled.

Cause

The root cause is that we set the status to running in RWC before the actual action takes place, it is not guaranteed for the workflow to start any time soon or start at all (it depends on an external system, Kubernetes). And if the workflow doesn't start, there won't be status changes.

Solutions

  • A possible solution for this would be to introduce a new status (see current statuses), e.g. pending (TBD better name). This way the flow would look like follows:

    1. User runs workflow -> queued (set by REANA-Server just before actually queuing it)
    2. REANA-Workflow receives a request to start the workflow -> sets its status to pending and requests Kubernetes to start the workflow engine
    3. Kubernetes starts the workflow-engine pod -> the workflow engine itself sets the status to running when it starts its execution (adding it to the factory create_workflow_engine_command so all workflow engines behave the same)
      Edit: this has to be done per engine now, since the factory is not yet merged to the latest master. Dedicated issue.

    This way if a workflow gets stuck in pending status, we could allow deletion for workflows in pending status. As for orphan workflow engine pods that could result from this, they could be garbage collected (e.g. for all deleted workflows, check if workflow engine exists and clean up).

    Other considerations related to create a new workflow status should be taken into account, for example:

    • Consider workflows in pending state for a long time to also be stuck.
    • Consider REANA-UI to reflect the new status and present it to the user
    • ...
@diegodelemos diegodelemos changed the title workflow-run-manager: workflow-engine doesn't start and workflow gets stuck in running state workflow-run-manager: create new pending status Apr 20, 2021
@audrium audrium self-assigned this Apr 20, 2021
audrium added a commit to audrium/reana-db that referenced this issue Apr 21, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 21, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 21, 2021
audrium added a commit to audrium/reana-db that referenced this issue Apr 21, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 21, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 22, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 22, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 22, 2021
audrium added a commit to audrium/reana-db that referenced this issue Apr 22, 2021
audrium added a commit to audrium/reana-db that referenced this issue Apr 23, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 23, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 26, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 26, 2021
audrium added a commit to audrium/reana-server that referenced this issue Apr 26, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 27, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 27, 2021
audrium added a commit to audrium/reana-ui that referenced this issue Apr 27, 2021
audrium added a commit to audrium/reana-workflow-controller that referenced this issue Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment