-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: transfers are abandoned if the workflow is interrupted #36
Comments
Thanks @scollazo. It'd be nice to fix this. I think I know what's happening. Part of our workflow runs within a session (download, bundle, transfer, poll transfer, poll ingest and hari/prod activities). Cadence detects that the worker running the session dies and terminates the session automatically. I think that we have to do a couple of things:
I've also filed #37 to describe the lack of the ability to deploy workers separately. I'm mentioning this here because once you have workers, you may be stopping other processes while not affecting others. |
While workers seems like a nice thing to have, I don't see how they'll solve the problem. Are you thinking about running a worker for each pipeline, and stop it in case of need, without stopping enduro itself? |
What I had in mind when I filled the issue, was that enduro should be able to hadle SIGHUP signal, and stop itself once all "IN PROGRESS" transfers finish. There can be corner cases where stuck tasks don't allow enduro to finish, and the process might need to be sent the SIGKILL signal, but that could be handled by systemd |
For the record, this cadence issue might be of interest cadence-workflow/cadence-go-client#775 |
Discussed offline. Tentatife fix in 486be45 (v0.21.0). Enduro already had signal handling which results in cancelation of activities. In v0.21.0, the processing session is retried after cancelation. Activities within the session will be executed only when needed, e.g. transfer-activity won't be executed if we already have a TransferID and so on. E.g. if Archivematica was busy at Transfer when the worker died, then poll-trnasfer-activity will eventually run (unless we have a SIPID) and will go back to wait as expected. So the idea is that transfers won't be abandoned. |
Reminder: the semaphore that protects pipelines is still local and it's flushed when you kill Enduro so expect more transfers to enter the critical section. That's not something that we can control well until we solve #43. |
There is no way to stop enduro without leaving transfers in an unknown state.
There should be a way to stop enduro gracefully, allowing all in progress tasks to finish without starting new ones.
The text was updated successfully, but these errors were encountered: