Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumer.Close() never finishes and always panics #140

Open
davidzeng opened this issue Jul 23, 2015 · 5 comments
Open

consumer.Close() never finishes and always panics #140

davidzeng opened this issue Jul 23, 2015 · 5 comments

Comments

@davidzeng
Copy link

I'm trying to Close() the consumer under circumstances, but whenever I call the Close() function, the worker manager never seems to finish.

13:39:52.1 | 2015-07-23/13:39:52 [INFO] [ConsumerFetcherRoutine-TTV-0582.local:e989b4a8-8365-45f3-e3b1-eaf942271e68-0] Closing fetcher
13:39:52.1 | 2015-07-23/13:39:52 [INFO] [ConsumerFetcherRoutine-TTV-0582.local:e989b4a8-8365-45f3-e3b1-eaf942271e68-0] Stopped fetcher
13:39:52.1 | 2015-07-23/13:39:52 [INFO] [TTV-0582.local:e989b4a8-8365-45f3-e3b1-eaf942271e68-manager] Successfully closed all fetcher manager routines
13:39:52.1 | 2015-07-23/13:39:52 [INFO] [TTV-0582.local:e989b4a8-8365-45f3-e3b1-eaf942271e68] Stopping worker manager...
13:44:52.1 | panic: Graceful shutdown failed

Are there pre-requisites that need to be fulfilled before calling the function? I've tried upping the timeout to 5 minutes, but it still does nothing.

@davidzeng
Copy link
Author

I did a little bit more digging and it seems like my workerManagers aren't closing due to the lock from start batch. My callback for WorkerFailedAttemptCallback and WorkerFailureCallback are both doNotCommitAndStop.

I only have the 10 default workers and 14 partitions running (14 worker managers). It seems like what's happening is that the worker stops, but some partitions are running startBatch, which waits on an available worker, but since my callback stops the worker, there are no workers and the call blocks indefinitely. I bumped up the number of workers to 30 in an attempt to free up more workers, but it seems like that's not enough. I'm not sure if adding more workers is the solution here. Why doesn't the Close() call send a kill signal to the workers?

@davidzeng
Copy link
Author

Another alternative is to have startBatch only block for a certain amount of time and then quit. However, I can see how that wouldn't be the best solution.

@baconalot
Copy link

I presume you want to close the consumer from the messagehandler. I also got these problems, and my fix was to make a wrapper around the library; put a message processed item in a channel from the messagehandler if you want to continue processing; and in new mainloop stop if no message processed have been recieved on that channel after x seconds.

Not really nice, but for the most part it got rid of the fatals for me.

I agree this should be handled more graceful.

@davidzeng
Copy link
Author

I'm calling Close() from the consumer that comes out of kafka.NewConsumer(kafkaConfig). @baconalot What's the messagehanlder you're talking about?

@teou
Copy link

teou commented Aug 26, 2015

it's a dead lock when startBatch and Close all try to hold the stopLock in worker manager
and at the same time Close's managerStop channel is waiting on startBatch to finish and select data from the managerStop channel.

I don't see why the wm.stopLock is necessary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants