-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support issue with Kubernetes #101
Comments
What was your write_timeout value? Pods do not die instantly but are configured to stay for a grace period. I'd like you to show how to reproduce this on Kubernetes, not with an arbitrary docker kill command as that's not necessarily the correct workflow and behaviour. |
/set title: Support issue with Kubernetes |
@alexellis Thank you so much for your comment. The reason is that I set write_timeout of the function too high. After setting write_timeout right, those pod will close correctly. Sorry for misunderstanding the meaning of write_timeout. There is my testing code: https://github.com/conan-o-chang/openfaas-long-running-task |
Hi @alexellis , sorry for bothering you again. Steps to Reproduce (for bugs)
# We get 5 pods of the function.
Every 2.0s: kubectl get po -n openfaas-fn Mon Jul 6 02:05:20 2020
NAME READY STATUS RESTARTS AGE
python3-flask-long-running-5b6684b647-dv92b 1/1 Running 0 16s
python3-flask-long-running-5b6684b647-f445m 1/1 Running 0 11s
python3-flask-long-running-5b6684b647-jtvtl 1/1 Running 0 11s
python3-flask-long-running-5b6684b647-nprh5 1/1 Running 0 11s
python3-flask-long-running-5b6684b647-tk2mt 1/1 Running 0 11s
# 4 pods are terminating
Every 2.0s: kubectl get po -n openfaas-fn Mon Jul 6 02:05:35 2020
NAME READY STATUS RESTARTS AGE
python3-flask-long-running-5b6684b647-dv92b 1/1 Running 0 31s
python3-flask-long-running-5b6684b647-f445m 0/1 Terminating 0 26s
python3-flask-long-running-5b6684b647-jtvtl 0/1 Terminating 0 26s
python3-flask-long-running-5b6684b647-nprh5 1/1 Terminating 0 26s
python3-flask-long-running-5b6684b647-tk2mt 1/1 Terminating 0 26s
# both fwatchdog and gunicorn(fprocess) is funning
~ $ ps
PID USER TIME COMMAND
1 app 0:00 fwatchdog
11 app 0:00 {gunicorn} /usr/local/bin/python /home/app/.local/bin/gunicorn index
15 app 0:00 {gunicorn} /usr/local/bin/python /home/app/.local/bin/gunicorn index
20 app 0:00 sh
26 app 0:00 ps
# fwatchdog is keep running even gunicorn was finished.
~ $ ps
PID USER TIME COMMAND
1 app 0:00 fwatchdog
20 app 0:00 sh
25 app 0:00 ps
Every 2.0s: kubectl get po -n openfaas-fn Mon Jul 6 02:15:26 2020
NAME READY STATUS RESTARTS AGE
python3-flask-long-running-5b6684b647-dv92b 1/1 Running 0 10m
python3-flask-long-running-5b6684b647-f445m 0/1 Terminating 0 10m
python3-flask-long-running-5b6684b647-jtvtl 0/1 Terminating 0 10m
python3-flask-long-running-5b6684b647-nprh5 0/1 Terminating 0 10m
python3-flask-long-running-5b6684b647-tk2mt 0/1 Terminating 0 10m Every 2.0s: kubectl get po -n openfaas-fn Mon Jul 6 02:15:50 2020
NAME READY STATUS RESTARTS AGE
python3-flask-long-running-5b6684b647-dv92b 1/1 Running 0 10m
Forking - gunicorn [index:app -b 127.0.0.1:5000 --log-level debug --timeout=600 --graceful-timeout=600]
2020/07/05 18:05:05 Started logging stderr from function.
2020/07/05 18:05:05 Started logging stdout from function.
2020/07/05 18:05:05 OperationalMode: http
2020/07/05 18:05:05 Timeouts: read: 10m0s, write: 10m0s hard: 10m0s.
2020/07/05 18:05:05 Listening on port: 8080
2020/07/05 18:05:05 Writing lock-file to: /tmp/.lock
2020/07/05 18:05:05 Metrics listening on port: 8081
...
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [11] [INFO] Starting gunicorn 20.0.4
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [11] [DEBUG] Arbiter booted
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [11] [INFO] Listening at: http://127.0.0.1:5000 (11)
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [11] [INFO] Using worker: sync
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [15] [INFO] Booting worker with pid: 15
2020/07/05 18:05:05 stderr: [2020-07-05 18:05:05 +0000] [11] [DEBUG] 1 workers
2020/07/05 18:35:12 SIGTERM received.. shutting down server in 10m0s
2020/07/05 18:35:12 Removing lock-file : /tmp/.lock
2020/07/05 18:35:12 stderr: [2020-07-05 18:35:12 +0000] [11] [INFO] Handling signal: term
2020/07/05 18:35:12 stderr: [2020-07-05 18:35:12 +0000] [15] [INFO] Worker exiting (pid: 15)
2020/07/05 18:35:12 stderr: [2020-07-05 18:35:12 +0000] [11] [INFO] Shutting down: Master
2020/07/05 18:45:12 No new connections allowed. Exiting in: 10m0s
rpc error: code = Unknown desc = Error: No such container: ba335b88f0f7862682be186c20d78c0ac46500a3973b2186e93fd76cbba8a80d |
Expected Behaviour
When we send a SIGTERM to
fwatchdog
, it should be closed afterfprocess
was gracefully shut down.Current Behaviour
When we send a SIGTERM to
fwatchdog
, it was still running afterfprocess
was gracefully shut down(exit code of fprocess is 0).In other case, if fprocess close with an error(exit code is not 0), watchdog will show a log
Forked function has terminated: ...
and kill itself.Possible Solution
Exit
fwatchdog
itself whenfprocess
exited. It should relate to https://github.com/openfaas-incubator/of-watchdog/blob/c796e1b714d703c90bf6e3f392471746e0aeab2d/executor/http_runner.go#L70-L76A idea is exit fwatchdog whatever fprocess died with error or not, but I am not sure is there any side effect about this change.
Steps to Reproduce (for bugs)
I am using template python3-flask, and
fprocess
is gunicornfaas template store pull python3-flask
faas new python3-flask-gracefully-shutdown --lang python3-flask
echo 'gunicorn' > python3-flask-gracefully-shutdown/requirements.txt
faas build -f python3-flask-gracefully-shutdown.yml
docker run --rm -ti -e fprocess='gunicorn index:app' -p 8080:8080 --name python3-flask-gracefully-shutdown python3-flask-gracefully-shutdown
docker kill --signal=SIGTERM python3-flask-gracefully-shutdown
Context
I am using OpenFaaS for some long-running task with auto-scaling on Kubernetes. One day I found that there are a lot of Pods did not terminate correctly(status is Terminating), which means it was wasting resource.
Your Environment
docker version
(e.g. Docker 17.0.05 ):Docker version 19.03.8, build afacb8b
Kubernetes
MacOS 10.13.6
The text was updated successfully, but these errors were encountered: