-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Business calls keep failing on non running instance using load-balancer=least-response-time #233
Comments
Isn't it an issue of non preserving the service instance ids? private List<ServiceInstance> map(String list) {
return Arrays.stream(list.split(",")).map(this::createInstance).collect(Collectors.toList());
}
private ServiceInstance createInstance(String hostPort) {
String[] split = hostPort.split(":");
String host = split[0];
int port = Integer.parseInt(split[1]);
return new DefaultServiceInstance(ServiceInstanceIds.next(), host, port, false);
} |
hmm... with 600s refresh period the refresh should not be called at all |
what is the contract on service ids?
but the test keeps failing. on the first call (instance 0, which is up) the call succeeds
I think may be the issue is that the |
I confirm that by changing that line in the note that caching the service id is probably a bad idea because an instance that failed once would be ignored forever I suspect. changing the service ids after each refresh will loose the stats, but allow failed instances to be retried. the way we handled this in our impl was that failed instance would go on the "not_ok" list for 1 minute with no calls dispatch to it (unless there were no instances available at all), and come back into the "ok" list after spending this minute on the side. the the difference is that our default refresh period (3 mins) was unrelated to our purgatory period (1 min). anyway, I am going to do a PR for the nanos issue. |
it's up to load balancer to decide when to reuse an instance that failed. The serviceIds should be preserved to let LB get a chance to make that decision
great, thank you! |
so I understand this is a must to preserve those ids. however, |
here is how the least-response-time works now: #225 E.g. if your |
Both parameters are configurable |
interesting. so if you have a high rate of outgoing calls (e.g. 20 per sec) the failed instance will be retried more often than if you have a low rate. I like better the predictable nature of the 1 minute penalty box, but this is not a big concern for me. the ability to do automatic retries when possible (which means that the load balancer should be able to provide a list of prioritized eligible instances, and instruct on the situations where calls can be retried or not) is what I would be looking at next. I am not an expert on MP fault tolerance, but one thing I would like to avoid is having all developers to annotate their I will expand those thoughts in #232 |
My discovery returns 2 service instances:
My client application repeatedly calls this service. The first call succeeds, the subsequent calls all fail.
To reproduce:
see https://github.com/vsevel/stork-issue/tree/issue_233
The text was updated successfully, but these errors were encountered: