Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting a job doesn't kill children process #248

Open
dridk opened this issue Jan 2, 2020 · 10 comments
Open

Aborting a job doesn't kill children process #248

dridk opened this issue Jan 2, 2020 · 10 comments
Assignees

Comments

@dridk
Copy link

dridk commented Jan 2, 2020

Summary

I created a plugin which run the following script :

    /mydata/plugins/run_bcl2fastq.sh .

This script run a command which use multithread ( 10 threads here ) :

#!/bin/bash 
bcl2fastq -p 10 $INPUT_FOLDER

When I run it from cronicle, I can view my job from command line using htop as follow :

├── Cronicle Server
│   ├── /bin/bash /mydata/plugins/run_bcl2fastq.sh
│           └── bcl2fastq -p 10 /mydata/input
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   

When I abort the job from cronicle UI , bcl2fastq is still running on the server. only run_bcl2fastq.sh has been killed

├── Cronicle Server
│   ├──  ̶ ̶b̶c̶l̶2̶f̶a̶s̶t̶q̶ ̶-̶p̶ ̶1̶0̶ ̶/̶m̶y̶d̶a̶t̶a̶/̶i̶n̶p̶u̶t̶
│           └── bcl2fastq -p 10 /mydata/input
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   
│           │        └── bcl2fastq -p 10 /mydata/input   

Your Setup

Operating system and version?

Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

Node.js version?

v12.14.0

Cronicle software version?

0.8.38 ?

Are you using a multi-server setup, or just a single server?

Single server

Are you using the filesystem as back-end storage, or S3/Couchbase?

No

@dridk
Copy link
Author

dridk commented Jan 2, 2020

Adding the "trap" command at the beginning of my script resolves the problem

#!/bin/bash
trap 'kill $(jobs -p)' EXIT
bcl2fastq -p 10 $INPUT_FOLDER

@dridk dridk changed the title Abording a job doesn't kill children process Aborting a job doesn't kill children process Jan 2, 2020
@mplattner
Copy link

I have a problem that might be related to this issue.

I use the shell script plugin with the following code.
If i abort the job manually, or cronicle aborts it after a defined 3min. timeout, the stress process and its 6 sub-processes are still running. It seems that only the /bin/sh /tmp/cronicle-script-temp-xxxxxx.sh is killed, and stress gets adopted by init.

#!/bin/sh
# Enter your shell script code here
stress --cpu 6

@jhuckaby jhuckaby self-assigned this Jan 26, 2020
@jhuckaby
Copy link
Owner

I'm sorry about this. It was a design choice in Cronicle, because in my use case I never want any child subprocesses to be directly killed. But I now clearly see the need, so I will add a checkbox to enable this behavior as an option.

For now, it looks like @dridk found a nice workaround using the trap command.

#!/bin/bash
trap 'kill $(jobs -p)' EXIT
# Your shell commands here

@mplattner
Copy link

mplattner commented Jan 26, 2020

No problem at all and thanks a lot @jhuckaby.

It isn't particularly my use case either, but I was trying to understand how Cronicle measures CPU usage and found this issue while playing with the stress tool.
A checkbox or another solution should is a good idea, as the current behaviour is unexpected (and a bug - Cronicle reported that the job has been aborted).

Again, thanks. Cronicle is very useful!

@oyearunpal
Copy link

@mplattner does adding trap in your script when you run stress kills all child process ?
Because I am not able to kill them.

@mplattner
Copy link

I haven't tried it and killed the processes manually. Sorry.

@nashok1226
Copy link

I have a similar issue.

Below is my script triggered from cronicle:

vag 27873 27872 7 17:07 ? 00:00:00 python3 /opt/kite/pykite/ab/testWS3.py
After abort from GUI, I could see PPID changes to 1 meaning root ? and the script continues to run. (Cronicle reports as abort completed )

vag 27873 1 0 17:07 ? 00:00:00 python3 /opt/kite/pykite/ab/testWS3.py
I tried the trap solution suggested above, but doesn't work .

Below is how I invoke the job.

#!/bin/sh

trap 'kill $(jobs -p)' EXIT

python3 /opt/kite/pykite/ab/testWS3.py

@jamesgibbard
Copy link

+1 for this

@FrancescoPezzulli
Copy link

I'm sorry about this. It was a design choice in Cronicle, because in my use case I never want any child subprocesses to be directly killed. But I now clearly see the need, so I will add a checkbox to enable this behavior as an option.

For now, it looks like @dridk found a nice workaround using the trap command.

#!/bin/bash
trap 'kill $(jobs -p)' EXIT
# Your shell commands here

This solution didn't work for me, the child process is still running after the abort.

I tried with a different approach, trying to transfer the same PID of the parent process to its child with the exec command.
Theoretically it should work. Unfortunately the chain brokes when third-party libraries are called that do not invoke scripts in this way.

@moravcik94
Copy link

Hello, we are using Cronicle for docker container running and scheduling.
We're using shell plugin to run commands like:

#!/bin/sh

docker run -e TZ=Europe/Bratislava --rm --name containername docker-registry.com:5000/registry/example-registry/image/to/run:1.0.0 --args=arg1 

We've made workaround to stop docker container after aborting job by coping shell-plugin.js and creating new custom docker-plugin.js and overriding function on SIGTERM signal.

image

Then we added new custom plugin and created parameter containername

image

and docker containers are stopped after hitting abort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants