Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set connection pool size in redis clients. #298

Merged
merged 3 commits into from
Apr 17, 2024

Conversation

fische
Copy link
Contributor

@fische fische commented Apr 16, 2024

I've realised today that the connection pool size on the redis client is 10*runtime.GOMAXPROCS, so we think our pods consider their pool size to be the number of CPUs on the host * 10, which is way too much. This would explain the recent big spikes in the number of connections to the primary redis node (and the number of rejected connections 😬 ).

This PR refactors a bit how we initialise the redis clients, so it's easier to add new options and adds 2 options to control the pool size on the primary redis client and the secondary one (read).

fische added 3 commits April 16, 2024 17:52
This is so we don't have to propagate all options down the line
and it's easier to add new ones.
@fische fische self-assigned this Apr 16, 2024
@Garbett1
Copy link
Contributor

QQ: When you say host here, do you mean pod or node?

@fische
Copy link
Contributor Author

fische commented Apr 17, 2024

QQ: When you say host here, do you mean pod or node?

I mean the node. It's using runtime.GOMAXPROCS which AFAIK will be the number of cores on the node.

To give you some numbers, we've seen spikes capping at 70-80K connections to the primary (which is a bit weird since max is 65K) and the number of rejected connections was >200K connections.

@Garbett1
Copy link
Contributor

I'm not sure it will be the case, as everything should be using this: https://github.com/thought-machine/please-servers/blob/master/cli/cli.go#L43-L45

The redis refactor is still worth it, but I suspect limited impact.

@fische
Copy link
Contributor Author

fische commented Apr 17, 2024

Interesting, I wasn't aware of this 🤔 The thing is that we scale up to 5K workers, so we should then see up to 50K connections to the primary redis node.

@fische
Copy link
Contributor Author

fische commented Apr 17, 2024

Ooh I think this will have something to do with the removal of the CPU limits on the worker pods. I believe this is what sets the CPU quota in the CGroup and given this is what automaxprocs uses to set GOMAXPROCS, I'm suspecting it's currently either a noop or setting it to the number of cores on the node.

@fische
Copy link
Contributor Author

fische commented Apr 17, 2024

Ooh I think this will have something to do with the removal of the CPU limits on the worker pods. I believe this is what sets the CPU quota in the CGroup and given this is what automaxprocs uses to set GOMAXPROCS, I'm suspecting it's currently either a noop or setting it to the number of cores on the node.

Yes that's it, I've just remembered that this thing prints a log line and this is what I can see on a new worker:

NOTICE: maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined

@Garbett1
Copy link
Contributor

Oh interesting. For some reason I thought it was using the cgroup CPU shares, rather than quota (So I thought it would pick the K8s request, rather than limit)

@fische fische merged commit d87013f into thought-machine:master Apr 17, 2024
5 checks passed
@fische fische deleted the redis-options branch April 17, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants