-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to set connection pool size in redis clients. #298
Conversation
This is so we don't have to propagate all options down the line and it's easier to add new ones.
QQ: When you say host here, do you mean pod or node? |
I mean the node. It's using runtime.GOMAXPROCS which AFAIK will be the number of cores on the node. To give you some numbers, we've seen spikes capping at 70-80K connections to the primary (which is a bit weird since max is 65K) and the number of rejected connections was >200K connections. |
I'm not sure it will be the case, as everything should be using this: https://github.com/thought-machine/please-servers/blob/master/cli/cli.go#L43-L45 The redis refactor is still worth it, but I suspect limited impact. |
Interesting, I wasn't aware of this 🤔 The thing is that we scale up to 5K workers, so we should then see up to 50K connections to the primary redis node. |
Ooh I think this will have something to do with the removal of the CPU limits on the worker pods. I believe this is what sets the CPU quota in the CGroup and given this is what automaxprocs uses to set GOMAXPROCS, I'm suspecting it's currently either a noop or setting it to the number of cores on the node. |
Yes that's it, I've just remembered that this thing prints a log line and this is what I can see on a new worker:
|
Oh interesting. For some reason I thought it was using the cgroup CPU shares, rather than quota (So I thought it would pick the K8s request, rather than limit) |
I've realised today that the connection pool size on the redis client is 10*runtime.GOMAXPROCS, so we think our pods consider their pool size to be the number of CPUs on the host * 10, which is way too much. This would explain the recent big spikes in the number of connections to the primary redis node (and the number of rejected connections 😬 ).
This PR refactors a bit how we initialise the redis clients, so it's easier to add new options and adds 2 options to control the pool size on the primary redis client and the secondary one (read).