Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase fd limits (and warn as we approach them) #7237

Merged

Conversation

rustyrussell
Copy link
Contributor

On many systems, the default fd limit is 1024, and we've got users with over 900 channels. But that's a soft limit, and you can simply ask for an increase!

Also, we should log when we are running low, so people don't have to ask me what to do!

@rustyrussell rustyrussell added this to the v24.05 milestone Apr 19, 2024
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK adbf9bc

@endothermicdev
Copy link
Collaborator

Added the connecting map to the memleak detection callback in connectd. Also included a driveby fix for tracepoints memory initialization.

@rustyrussell
Copy link
Contributor Author

Fixing up test assumption that hard limit > soft limit (on CI it seems to be 65536 for both!).

@rustyrussell rustyrussell force-pushed the increase-fd-limits branch from f858d34 to e663658 Compare May 7, 2024 01:48
@endothermicdev
Copy link
Collaborator

This works fine for me locally, but the CI seems unable to raise the fd limit after pytest constrains it. I noticed if I run pytest as root, it exhibits the same behavior.

@benthecarman
Copy link
Contributor

would be nice to test this in docker, kubernetes, and the like to make sure it doesn't have the same constraints

@rustyrussell
Copy link
Contributor Author

would be nice to test this in docker, kubernetes, and the like to make sure it doesn't have the same constraints

Agreeed. Meanwhile, this doesn't make it worse if we can't increase fd limits, and at least now we'll get a warning if it seems limiting.

So we can log when we hit fd limits on accept/recvmsg.

Signed-off-by: Rusty Russell <[email protected]>
This can happen if we're totally out of fds, but previously we gave
no log message indicating this!

Signed-off-by: Rusty Russell <[email protected]>
I thought I was going to want to have a convenient way of counting
these, but it turns out unnecessary.  Still, this is slightly more
efficient and simple, so I am including it.

Signed-off-by: Rusty Russell <[email protected]>
We use a crude heuristic: if we were trying to contact them, it's a
"deliberate" connection, and should be preserved.

Changelog-Changed: connectd: prioritize peers with channels (and log!) if we run low on file descriptors.
Signed-off-by: Rusty Russell <[email protected]>
… channels.

1024 is a common limit, and people are starting to hit that many channels, so we should increase it: twice the number of channels seems reasonable, though we only do this at restart time.

Changelog-Changed: lightningd: we now try to increase the number of file descriptors, if it's less than twice the number of channels at startup (and log if we cannot!).
Signed-off-by: Rusty Russell <[email protected]>
They're cheap.  The 2x channels heuristic is nice, but does assume
they restart every so often.  If someone hits 64k connections I would
like to know anyway!

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the increase-fd-limits branch from e663658 to 995fefd Compare May 9, 2024 03:30
We expect:
	UNUSUAL.*WARNING: we have 1 channels but can file descriptors limited to 65536

We get:
	lightningd: WARNING: we have 1 channels but can file descriptors limited to 32768!

This is strange, since the first line is from Python's hard limit.  Presumably something is restricting the fd limit of children

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the increase-fd-limits branch from 995fefd to 71c039e Compare May 9, 2024 04:36
Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 71c039e

Passing the github CI should also give us some clue about the limits in VMs?

@rustyrussell rustyrussell merged commit f0d9fc6 into ElementsProject:master May 9, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants