-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG-fetchall handler fatal error when unable to connect to netscaler #10
Comments
I'm guessing it's o.Log.Fatal(err) in sdk_fork New
|
Looks like we're peppered with a number of
|
Log into the lbapi server and run More than likely, there is a netscaler instance that is no longer live, and the system is trying to connect to it. It will show it in the logs. If that is the case, remove the netscaler and it's HA members from lbapi. |
That's going to be my short term fix, but to be clear it is that's a short term fix to a long term problem where lbapi is vulnerable to dying from a variety of potentially transitory issues that need to be handled more gracefully. Connection timeout should a warning and moving on, not killing the entire lbapi, as I imagine is the case for pretty much every o.Log.Fatal(err) entry outside of main.go |
The reasoning for leaving it as is was to raise an alarm if lbapi was unable to talk to a load balancer. This would prevent a client from attempting to build a virtual service on that unit and force the support team to look into why a load balancer was timing out or not available - if that makes sense. But to your point, I completely agree that there are better ways to handle this and it should not be a total panic. I'll look into having go recover automatically following the exception or even possibly setting the docker restart policy to reload lbapi, following the exception. In either case though, there has to be some mechanism to alert the support team that lbapi cannot talk to the destination resource. |
That's a fair point. Sounds a lot like needing a prometheus exporter + alertmanager ;) |
time="2020-10-22T05:00:31Z" level=fatal msg="timout connecting to 192.168.48.20" handler=fetchall route=loadbalancer user=foo
lbapi dies when this occurs and has to be restarted.
The text was updated successfully, but these errors were encountered: