Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimize] graceful shutdown issues in some special cases. #47

Open
EdmondFrank opened this issue Nov 12, 2024 · 5 comments
Open

[Optimize] graceful shutdown issues in some special cases. #47

EdmondFrank opened this issue Nov 12, 2024 · 5 comments

Comments

@EdmondFrank
Copy link

Hi akash-aky, First of all, thank you for creating Exile, it's a very amazing library! I recently ran into some problems using it to execute parallel this application, here are my debug result:

~$ parallel --tagstring "{}|" --line-buffer tail -F {} ::: compass-web-service/log/{slow_queries,bullet}.log
service/log/bullet.log|	Call stack
service/log/bullet.log|	  /home/git/service/app/models/lab_model.rb:61:in `default_version'
service/log/bullet.log|	
service/log/bullet.log|	2024-10-23 14:12:21[WARN] user: git
service/log/bullet.log|	POST /api/graphql
service/log/bullet.log|	AVOID eager loading detected
service/log/bullet.log|	  LabModelVersion => [:lab_dataset]
service/log/bullet.log|	  Remove from your query: .includes([:lab_dataset])
service/log/bullet.log|	Call stack
service/log/bullet.log|	


parallel: SIGTERM received. No new jobs will be started.
parallel: Waiting for these 2 jobs to finish. Send SIGTERM again to stop now.
parallel: tail -F service/log/bullet.log
parallel: tail -F service/log/slow_queries.log

As you can see above, while running parallel with Exile.stream!, I closed my application, but the external processes opened by Exile were not cleaned up correctly.

I think the reason may be that these particular processes need to send the SIGTERM signal multiple times for a graceful shutdown.

I tried to look up the code and found that Exile uses SIGKILL to force the process to terminate directly after sending a SIGTERM timeout, resulting in the tail process opened by parallel not being properly shut down even though the parallel process is force-terminated.

So I tried to make a simple and rough patch to deal with this scenario: trying to send SIGTERM 3 times at first, if it can't be closed gracefully then send SIGKILL to force kill this external process.

I'll make a pull request if you'd like to merge in this patch.

@akash-akya
Copy link
Owner

Hey @EdmondFrank, thanks for reporting and the patch.

So you are spawning parallel using Exile correct? and you are trying to shutdown parallel (and linked child processes) gracefully? From a cursory glace, the patch seems to be one simple way of solving this, but I have look on how such scenarios are usually handled elsewhere.

If possible can you share the Elixir code for the same? if you have it at hand :)

@EdmondFrank
Copy link
Author

EdmondFrank commented Nov 14, 2024

Hey @EdmondFrank, thanks for reporting and the patch.

So you are spawning parallel using Exile correct? and you are trying to shutdown parallel (and linked child processes) gracefully? From a cursory glace, the patch seems to be one simple way of solving this, but I have look on how such scenarios are usually handled elsewhere.

If possible can you share the Elixir code for the same? if you have it at hand :)

Sure, I created a minimal test repository for you based on my use case: https://github.com/EdmondFrank/exile_log_collector

@akash-akya
Copy link
Owner

akash-akya commented Dec 16, 2024

@EdmondFrank sorry for the late response, I got caught up with my day job.

Thanks for sharing the repo. I understand the issue in general but in this specific case it seems GNU Parallel behaviour is changed? https://git.savannah.gnu.org/cgit/parallel.git/tree/NEWS#n1036, see: https://superuser.com/questions/1660301/gnu-parallel-is-behaving-differently-on-sigterm-in-two-cygwin-installations

It no longer requires multiple SIGTERM signals, so I can not reproduce the issue locally. My local version:

$ parallel --version
GNU parallel 20240922
Copyright (C) 2007-2024 Ole Tange, http://ole.tange.dk and Free Software
Foundation, Inc.

Can share which version you are using?

@EdmondFrank
Copy link
Author

@EdmondFrank sorry for the late response, I got caught up with my day job.

Thanks for sharing the repo. I understand the issue in general but in this specific case it seems GNU Parallel behaviour is changed? https://git.savannah.gnu.org/cgit/parallel.git/tree/NEWS#n1036, see: https://superuser.com/questions/1660301/gnu-parallel-is-behaving-differently-on-sigterm-in-two-cygwin-installations

It no longer requires multiple SIGTERM signals, so I can not reproduce the issue locally. My local version:

$ parallel --version
GNU parallel 20240922
Copyright (C) 2007-2024 Ole Tange, http://ole.tange.dk and Free Software
Foundation, Inc.

Can share which version you are using?

Yes, it seems that the version I used on my deployment machine is too old. GNU Parallel behavior has changed in newer versions. When I upgraded it to the latest version, the problem was fixed.

$ parallel --version
GNU parallel 20190122
Copyright (C) 2007-2019 Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.

Web site: http://www.gnu.org/software/parallel

When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.

Thank you for your very much focus on this issue. If you still have an interest in optimizing this special case, maybe you can build an older version from source. I could reproduce the issue on this version.

image

@akash-akya
Copy link
Owner

Hey @EdmondFrank, thanks for the confirmation.

Agree about still handling the special case. I'll push changes based on your suggestion soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants