-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACK timeout kills connection without getting restarted #106
Comments
Thanks for the report. The log you are seeing immediately causes the client to reconnect, so i am assuming there is something more at play here: https://github.com/dashbitco/broadway_rabbitmq/blob/master/lib/broadway_rabbitmq/producer.ex#L527-L536 |
The error after that is related to a genserver call, with the genserver down:
|
So, we ack from a different process than the RabbitMQ producer (the processor or batcher acks). I don't think we can "save" the ack if the channel is down. What we can do, however, is have a better error message from Broadway, which is what I did with #122. I think for now that's pretty much it. 😞 Eventually the producer should reconnect. |
We fall into related issue: ack timeout -> channel closed by rabbitMQ server -> while broadway reconnects there is several log messages about unable to ack/reject messages because of dead channel -> more ack timeouts growing every 30 minutes (default rabbitMQ consumer timeout) -> eventually we have a lot of channel reconnects but worst thing is that it appears rabbitMQ will keep all mnesia segments containing unacked messages, with 30 minutes timeout and high throughput it eats disk space pretty wild. We are going to try short timeout as our ingestion is intended to be pretty fast. Regarding the topic: does it makes any sense to retry ack/reject several times when channel is not alive? Another option would be to at least give some control over messages broadway is unable to ack/reject, something like handle_ack_error or so. |
versions:
broadway: 1.0.0
bradway_rabbitmq: 0.7.0
amqp: 2.1
elixir: 1.12
otp: 24.0.5
I have some long-running tasks that sometime may time-out the
consumer_timeout
from rabbitmq with message:The expected behavior would be to reestablish a new connection, kill the timed-out processors and rabbitmq to redeliver messages.
The current behavior is that the GenServer is killed and broadway can no longer send messages to rabbitmq. This is fixed only by restarting the broadway process.
The text was updated successfully, but these errors were encountered: