-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(envoy): create a CDS cluster per model #5916
fix(envoy): create a CDS cluster per model #5916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this looks that it will solve the reported 503 issue. However I am not sure how this will scale with the number of models. Any thoughts on how envoy can handle clusters in the 100s-1000s?
Supporting this scale might not be of immediate concern though (yet).
There's no limit on the number of clusters envoy can handle, once resources are allocated appropriately - since these clusters don't have active health checking the overhead of having one per model shouldn't be too expensive. |
And for reference, performance will likely be impacted by the number of stats generated by the clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! should we also remove computeHashKeyForList
util as it is not being used anymore (we can always recover from history if required)?
It's gone - check the first change in the list. |
Thanks for some reason I missed it, I should have looked more carefully. |
What this PR does / why we need it:
When scaling a model replica count up, existing clusters were removed and new ones were added in the delta response, as the cluster name changed. This resulted in the downstream receiving
503
s due to no cluster being found.Instead of changing the cluster name everytime the number of replicas changes, just keep the cluster name static, so clusters will be updated in place.
Which issue(s) this PR fixes:
Fixes #
INFRA-1150
Special notes for your reviewer: