Skip to content
This repository has been archived by the owner on Oct 6, 2018. It is now read-only.

Kafka Web Console release v2.0.0 is creating a high number of open file handles (against Kafka 0.8.1.1, ZooKeeper 3.3.4) #47

Closed
tonyfalabella opened this issue Jan 21, 2015 · 10 comments

Comments

@tonyfalabella
Copy link

I'm running Kafka Web Console release v2.0.0 against Kafka 0.8.1.1 and ZooKeeper 3.3.4

I'm consistently seeing the number of open file handles increasing when I launch Kafka Web Console after navigating to a topic on Zookeeper.
Once the file handles start to increase, they increase without any more navigation being done in the browser - meaning I only need to launch the web console and do nothing else beside monitor the number of open files and I'll see it increase every few seconds.
I've confirmed there are no other producers or consumers connecting to Kafka or Zookeeper.

After this runs for a while you'll get either of these errors:

  • Run a Kafka command like this:
$INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2

You'll get an error like this:

Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 0; nested exception is:
    java.net.BindException: Address already in use
  • Java clients might get an error like this (due to "Too many open files"):
java.io.FileNotFoundException: /src1/fos/dev-team-tools/var/kafka/broker-0/replication-offset-checkpoint.tmp

The ulimit for the id that my Kafka process runs under has a very large value for the "open files".

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 610775
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 500000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 610775
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Note, I've also tried this Pull Request from @ibanner56 ( #40) which is related to these issues (#36 and #37 from @mungeol) but it did not fix the issue.

To reproduce on Linux do the following.

  1. Launch ZooKeeper
  2. Launch Kafka
  3. Create a topic with 4 partitions with 1 replication...
    $INSTALLDIR/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 4 --topic test2
  4. Open a Putty session and run this script in that window
while [[ 1 == 1 ]]; do
  date
  echo "zookeeper: $(ls -ltr /proc/`ps -ef |grep zookeeper.server|grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo "Kafka: $(ls -ltr /proc/`ps -ef |grep kafka.Kafka     |grep -v grep|awk '{print $2}'`/fd |wc -l)"
  echo ""      
  sleep 5;
done
  1. Launch Kafka Web Console
  2. Browse to a topic
  3. Notice the number of "Kafka" connections in the Putty session should increase
  4. Wait several seconds. Notice the number of "Kafka" connections in the Putty session should increase again, without doing anything.
    Sample output from the script in Topic Feed hangs browser #4 after running for a couple of hours (with 8 topics defined on the Zookeeper instance, 1 replication each, 4 partitions each).
Wed Jan 21 18:44:29 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:34 EST 2015
zookeeper: 37
Kafka: 6013

Wed Jan 21 18:44:39 EST 2015
zookeeper: 37
Kafka: 6045

...

Wed Jan 21 18:51:23 EST 2015
zookeeper: 37
Kafka: 6461
@gruaig
Copy link

gruaig commented Jan 22, 2015

Its like the files are not being closed I too experience this issue.
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 27424 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 28864 0 6552758
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 29600 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 29760 0 6552758
root@cerb ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerb~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30272 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30976 0 6552758
root@cerberus ~ # sysctl fs.file-nr
fs.file-nr = 30976 0 6552758

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515011
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 515011
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

@foovungle
Copy link

I wrote up a stackoverflow issue http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding before I found this thread. The only thing I could do was to restart the server. What have you been doing?

@gruaig
Copy link

gruaig commented Feb 17, 2015

Hey

What we have been doing is setting the number of open files on our system
to the max. "65355". The application no longer crashes ..

Sean

On Mon, Feb 16, 2015 at 10:58 PM, Foo Lim [email protected] wrote:

I wrote up a stackoverflow issue
http://stackoverflow.com/questions/28549868/kafka-web-console-using-twitter-finagle-not-responding
before I found this thread. The only thing I could do was to restart the
server. What have you been doing?


Reply to this email directly or view it on GitHub
#47 (comment)
.

@foovungle
Copy link

Hi,
I contemplated that as well. It's a good stop gap, but eventually, it'll hit the limit (faster if there are more partitions). I was looking for a more permanent solution, but this'll have to do for now, I guess.
-F

@gruaig
Copy link

gruaig commented Feb 17, 2015

Yeah we have moved away and developed our own solution thats very similar
to kafka-web console/

On Tue, Feb 17, 2015 at 9:01 AM, Foo Lim [email protected] wrote:

Hi,
I contemplated that as well. It's a good stop gap, but eventually, it'll
hit the limit (faster if there are more partitions). I was looking for a
more permanent solution, but this'll have to do for now, I guess.
-F


Reply to this email directly or view it on GitHub
#47 (comment)
.

@tonyfalabella
Copy link
Author

This is really a major issue. Not only does Kafka become unstable but it can reek havoic on any other process that needs to use ports when the "open files" limit has been reached. I've also observed instability even when that max has not been reached.

To fix the issue we used to kill web-console. I can't remember if we also then occassionally had to rebuild some of the topic files or not.

You'll also notice a ton of messages being generated in your zookeeper log file. The log file can quickly grow to be quite large.

Due to this issue we've stopped using kafka-web console and are also implementing our own solution. I love that @claudemamo created this and has offerred it to be used by others (it's a nice little GUI). Unfortunately I don't think the Kafka Wiki should suggest people consider using kafka-web-console until this issue is closed. It really makes Kafka (and possibly your entire server) unstable.

@cjmamo
Copy link
Owner

cjmamo commented Feb 18, 2015

Duplicate of #30

@gruaig
Copy link

gruaig commented Feb 18, 2015

This isint a duplicate.

On Wed, Feb 18, 2015 at 7:43 AM, Claude Mamo [email protected]
wrote:

Duplicate of #30
#30


Reply to this email directly or view it on GitHub
#47 (comment)
.

@foovungle
Copy link

I tried the fork in development, & open files are kept under control. Will roll to production in the next few days to see if this helps..

@foovungle
Copy link

With https://github.com/ibanner56/kafka-web-console the system still hangs but it takes longer & not due to too many connections to kafka. I get a bunch of these when I do a sudo lsof:

java 16240 root 1535w FIFO 0,8 0t0 42163244 pipe
java 16240 root 1536u 0000 0,9 0 7808 anon_inode
java 16240 root 1537u 0000 0,9 0 7808 anon_inode
java 16240 root 1538u 0000 0,9 0 7808 anon_inode
java 16240 root 1539w FIFO 0,8 0t0 42193027 pipe
java 16240 root 1541r FIFO 0,8 0t0 42186896 pipe
java 16240 root 1542w FIFO 0,8 0t0 42186896 pipe
java 16240 root 1543r FIFO 0,8 0t0 42174664 pipe
java 16240 root 1544w FIFO 0,8 0t0 42174664 pipe
java 16240 root 1545u 0000 0,9 0 7808 anon_inode
java 16240 root 1546u 0000 0,9 0 7808 anon_inode
java 16240 root 1547r FIFO 0,8 0t0 42199219 pipe
java 16240 root 1548r FIFO 0,8 0t0 42176277 pipe
java 16240 root 1549w FIFO 0,8 0t0 42176277 pipe

Eventually, the system runs out of open files.. Don't have time to debug this at the moment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants