You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2023-04-28 01:58:09 +0000] [11] [CRITICAL] WORKER TIMEOUT (pid:15)
[2023-04-28 01:58:09,717] INFO in client: Got keepalive def03be1-9193-4219-be32-5c3caf806f6e in 10.36s
Exception ignored in: <function _ChannelCallState.__del__ at 0x7fd37fb905e0>
Traceback (most recent call last):
File "/app/__pypackages__/3.8/lib/grpc/_channel.py", line 1247, in __del__
self.channel.close(cygrpc.StatusCode.cancelled,
File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 513, in grpc._cython.cygrpc.Channel.close
File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 399, in grpc._cython.cygrpc._close
File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 420, in grpc._cython.cygrpc._close
File "/usr/local/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
File "/app/__pypackages__/3.8/lib/gevent/thread.py", line 121, in acquire
acquired = BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 259, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
Hub: <Hub '' at 0x7fd38477b220 epoll default pending=0 ref=0 fileno=6 resolver=<gevent.resolver.thread.Resolver at 0x7fd383daf100 pool=<ThreadPool at 0x7fd380a6c740 tasks=0 size=0 maxsize=10 hub=<Hub at 0x7fd38477b220 thread_ident=0x7fd385f4b740>>> threadpool=<ThreadPool at 0x7fd380a6c740 tasks=0 size=0 maxsize=10 hub=<Hub at 0x7fd38477b220 thread_ident=0x7fd385f4b740>> thread_ident=0x7fd385f4b740>
Handles:
[]
服务架构是,gunicorn启动的WSGI server用Nginx做反向代理。 就是网络上说的Nginx + gunicorn + Flask的架构。
错误日志是:
在线上发现一个现象,一个http请求Python Flask写的REST API 服务被Block住了很久,我把gunicorn的timeout配置加大也不行。试了这个 https://stackoverflow.com/questions/10855197/frequent-worker-timeout 链接里面的各种方法,包括把preload设置为True也不行。
gunicorn的配置:
后来仔细想了下,为什么其他的Http REST API接口并没有这么被block住超时的情况,我想了下,是这个API又调用了stable diffusion的gRPC的API,不是stable diffustion的REST API。然后我的gunicorn的worker_class又是gevent的配置,如果是默认的sync配置,则没有以上问题。但是我的服务端的场景,更推荐用async的gevent啥的。所以我就尝试Google了grpc gevent gunicorn相关的关键词,终于找到了,原来是gevent和grpc根本不兼容导致的。
要在你的Flask入口程序,比如 app.py的import标准库之前(文件头的最开始处)写以下兼容性的补丁代码:
我想着这个代码比较丑陋,而且相关的兼容性issue也比较早了。我的gevent和grpc版本应该不老,按理来说早就被开源社区修复掉了,而且他们说已经解决,这个只是临时补丁而已。没想到居然还是通过这个补丁解决了,既然已经解决,我就不具体去查那个版本解决的了。
打完补丁后,gunicorn的配置,建议改成如下:
References
The text was updated successfully, but these errors were encountered: