Add option to set connection pool size in redis clients. #298

fische · 2024-04-16T17:03:46Z

I've realised today that the connection pool size on the redis client is 10*runtime.GOMAXPROCS, so we think our pods consider their pool size to be the number of CPUs on the host * 10, which is way too much. This would explain the recent big spikes in the number of connections to the primary redis node (and the number of rejected connections 😬 ).

This PR refactors a bit how we initialise the redis clients, so it's easier to add new options and adds 2 options to control the pool size on the primary redis client and the secondary one (read).

This is so we don't have to propagate all options down the line and it's easier to add new ones.

Garbett1 · 2024-04-17T08:00:47Z

QQ: When you say host here, do you mean pod or node?

fische · 2024-04-17T08:13:48Z

QQ: When you say host here, do you mean pod or node?

I mean the node. It's using runtime.GOMAXPROCS which AFAIK will be the number of cores on the node.

To give you some numbers, we've seen spikes capping at 70-80K connections to the primary (which is a bit weird since max is 65K) and the number of rejected connections was >200K connections.

Garbett1 · 2024-04-17T08:16:07Z

I'm not sure it will be the case, as everything should be using this: https://github.com/thought-machine/please-servers/blob/master/cli/cli.go#L43-L45

The redis refactor is still worth it, but I suspect limited impact.

fische · 2024-04-17T08:30:28Z

Interesting, I wasn't aware of this 🤔 The thing is that we scale up to 5K workers, so we should then see up to 50K connections to the primary redis node.

fische · 2024-04-17T08:42:55Z

Ooh I think this will have something to do with the removal of the CPU limits on the worker pods. I believe this is what sets the CPU quota in the CGroup and given this is what automaxprocs uses to set GOMAXPROCS, I'm suspecting it's currently either a noop or setting it to the number of cores on the node.

fische · 2024-04-17T08:46:17Z

Ooh I think this will have something to do with the removal of the CPU limits on the worker pods. I believe this is what sets the CPU quota in the CGroup and given this is what automaxprocs uses to set GOMAXPROCS, I'm suspecting it's currently either a noop or setting it to the number of cores on the node.

Yes that's it, I've just remembered that this thing prints a log line and this is what I can see on a new worker:

NOTICE: maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined

Garbett1 · 2024-04-17T09:07:28Z

Oh interesting. For some reason I thought it was using the cgroup CPU shares, rather than quota (So I thought it would pick the K8s request, rather than limit)

fische added 3 commits April 16, 2024 17:52

Initialise redis clients from main function.

7650f4e

This is so we don't have to propagate all options down the line and it's easier to add new ones.

Add option to set PoolSize on redis client.

7dfd73a

Bump version.

71736e1

fische requested review from peterebden, Garbett1 and Hamishpk April 16, 2024 17:03

fische self-assigned this Apr 16, 2024

Hamishpk approved these changes Apr 17, 2024

View reviewed changes

fische merged commit d87013f into thought-machine:master Apr 17, 2024
5 checks passed

fische deleted the redis-options branch April 17, 2024 09:23

fische mentioned this pull request Apr 17, 2024

Return nil clients if no redis URL is passed. #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to set connection pool size in redis clients. #298

Add option to set connection pool size in redis clients. #298

fische commented Apr 16, 2024

Garbett1 commented Apr 17, 2024

fische commented Apr 17, 2024

Garbett1 commented Apr 17, 2024

fische commented Apr 17, 2024

fische commented Apr 17, 2024

fische commented Apr 17, 2024

Garbett1 commented Apr 17, 2024

Add option to set connection pool size in redis clients. #298

Add option to set connection pool size in redis clients. #298

Conversation

fische commented Apr 16, 2024

Garbett1 commented Apr 17, 2024

fische commented Apr 17, 2024

Garbett1 commented Apr 17, 2024

fische commented Apr 17, 2024

fische commented Apr 17, 2024

fische commented Apr 17, 2024

Garbett1 commented Apr 17, 2024