-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullPointerException caused zkClient to close socket with zookeeper and kafka broker not functioning #71
Comments
Looking at the logs and the code i think this is what happens:
So much to the analysis... So there are many things to improve here...
Anyone who wants to work on that stuff ? |
@jzillmann I am interested in taking this up. |
@harshach So what would you need for that ? |
@jzillmann working on a patch will open a PR. Thanks. |
@jzillmann opened a PR that resolves this issue when there is DNS or cached ip issue happens. |
I have a similar problem. Has it been solved |
Hello,
We use kafka version 0.10.2.1 which uses zkclient 0.10 (zkclient-0.10.jar). We use a setup of 3 kafka brokers and 3 zookeeper nodes. Both kafka and zookeeper run in docker containers. We have seen a few occasions where one kafka broker remains running, but there is no connection with zookeeper (netstat reports no socket connected to port 2181). This is probably caused by a transient network issue or transient DNS resolution issue, however it seems that the transient issue becomes permanent and the broker never recovers. Instead the broker remains out of sync with the other 2 brokers. I can recover it only by restarting the broker.
I can see the following from the broker logs (unfortunately no debug logs are available):
I checked the last log ("Error handling event ..."). Looking at the code it seems that the NullPointerException is caught here: link
However after the exception is caught the while loops seems to continue, and no more events arrive, possibly. There does not seem to be any re-connection attempts or even a process restart. This assumption matches with what I saw in the particular environment. The kafka broker remained running but there was no connection with zookeeper (no connected socket reported by netstat) and the broker did not respond to any events. Also the broker was out of sync for all topics/partitions. A broker restart fixed the problem.
Is the above assumption valid, or did I miss something? Do you think this is a bug of the zkClient code? How can we ensure that the broker will reconnect to zookeeper?
I found a similar issue already reported: KAFKA-2182. This issue appears to be resolved, which is puzzling.
Best regards,
Klearchos
The text was updated successfully, but these errors were encountered: