Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpftune doesn't seem to notice its' hit an external limit when attempting to increase TCP buffers #93

Open
wolfspyre opened this issue Oct 8, 2024 · 3 comments

Comments

@wolfspyre
Copy link

first, want to say that BPFTune is super cool.

This is really good stuff. and I'm thankful for the efforts put forth at autotuning.
I think a good companion tool to this would leverage a sibling host, to help identify optimal interface settings wrt nic buffers, MTU, mss, etc... but that's a tangent of a different flavor.

I have observed a peculiar behavior with bpftune on my proxmox hosts that I suspect others may hit as well.

Ceph wants to schep lots of data around. and so bpftune is trying to increase the tcp buffer accordingly.
...but it seems unaware of the 2g limit to rmem_max and keeps hitting a wall attempting to increase rmem max beyond 2gb:

2024-10-07T18:41:48.055804-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.055927-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.152832-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.152950-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.216185-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.216286-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.311124-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.311317-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.480604-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.480783-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.525093-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.525232-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.620254-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.620394-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.926975-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput
2024-10-07T18:41:48.927068-05:00 px-m-45 bpftune[2726]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (67108864 134217728 2000000000) -> (67108864 134217728 2500000000)
2024-10-07T18:41:48.978057-05:00 px-m-45 bpftune[2726]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughput

This is 1 second of logs about this ;)

root@px-m-45:/var/log# uname -a
Linux px-m-45 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux
root@px-m-45:/var/log# sysctl net.ipv4.tcp_rmem
net.ipv4.tcp_rmem = 67108864	134217728	2000000000
sysctl -w net.ipv4.tcp_rmem="67108864 134217728 2000000000"
net.ipv4.tcp_rmem = 67108864 134217728 2000000000
root@px-m-45:/var/log# sysctl -w net.ipv4.tcp_rmem="67108864 134217728 2147483647"
net.ipv4.tcp_rmem = 67108864 134217728 2147483647
root@px-m-45:/var/log# sysctl -w net.ipv4.tcp_rmem="67108864 134217728 2147483648"
sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
root@px-m-45:/var/log#

This makes sense, as net.ipv4.rmem_max / net.ipv4.wmem_max cap at 2g-1

root@px-m-45:/var/log# sysctl -w net.core.rmem_max=2147483647
net.core.rmem_max = 2147483647
root@px-m-45:/var/log# sysctl -w net.core.rmem_max=2147483648
sysctl: setting key "net.core.rmem_max": Invalid argument

root@px-m-45:/var/log# sysctl -w net.core.wmem_max=2147483647
net.core.wmem_max = 2147483647
root@px-m-45:/var/log# sysctl -w net.core.wmem_max=2147483648
sysctl: setting key "net.core.wmem_max": Invalid argument

This link isn't REALLY relevant, except that it points out the 2g cap

So... I suppose I'd expect that bpftune would be aware that this is (seemingly?) a hard limit.. or identify that it's hitting some limitation, and perhaps intelligently trying to find a maximal value, rather than simply trying to increease by 25%?

thoughts?

again, thanks for making this... it's slick.

@wolfspyre wolfspyre changed the title bpftune doesn't seem to notice its' hit some external limit wall when attempting to increase TCP buffers bpftune doesn't seem to notice its' hit an external limit when attempting to increase TCP buffers Oct 8, 2024
@alan-maguire
Copy link
Member

thanks for the detailed report, it's really helpful! On the core issue, definitely bpftune should notice, but the key question is what to do in the general case. In your particular case, 2gb seems like a good max limit, but in cases where rmem_max is too low, should we bump it too? I'd like to have better mechanisms to put the brakes on runaway increases, so any ideas are most welcome! currently we look for correlations between buffer size increases and RTT increases as a signal that we're buffering too much, but something more sophisticated would be good here. i'll give it some thought and do some experiments at my end and post when I have something. Again, thanks for filing this!

@wolfspyre
Copy link
Author

well, maybe..

we all know that everything has a downside. i’m not sure what other implications or repercussions of setting the default allocation per connection higher….

do we hit some ‘max amount of address space’ we can allocate or reference in a single operation, causing what could previously be performed in one cycle to now take two or more?

(idk if that’s a real thing, jus theorizing around the possible ways that bigger mightn’t be better :) )

we’re talking at least two problems here. so i’ll separate them:

  1. rmem_max hard limit awareness

as that link references, 2gb is the largest that rmem_max may be set to.

unless something changes in the kernel, this is a boundary that we can’t go further than.

does bpftune have a table of similar core truths and value maximum caps that it references as ‘we can’t go past 10… err …11’ :)

now in this case we should certainly g/a and adjust rmem_max and wmem_max to their upper bounds… and then continue adjusting tcp.rmem upwards

  1. the definition of insanity…

i would think that ‘i tried to make this same adjustment but the operation didn’t take for some reason’ aught be noticed…
do we decide ‘our stride is too wide here’ and attempt to make a smaller incremental adjustment to see if it takes?

do we decide ‘this knob is not having the effect we had hoped’ and look for other options?

do we abstract further, and have a ‘local observation ratchet mechanism’ and a wider lens ‘desired end result solver’ mechanism to call to help suggest next plan of action when we get into headbanging loops like this?

ie the ‘I NEED AN ADULT!’ call to help suggest alternate adjustment strategies seeing as we got stuck here?

either way, if i failed to make this adjustment twice in a row, the third fail aight likely tickle some sort of back off or signal that ‘We’re doin’ it Wrong!™️’ as the current ‘bang wall harder with head’ method is actually hurting performance, if nothing else than from the sheer volume of log messages being generated.

maybe we have an acceptable ‘adjustments here per ~10s period’ number that we employ to concurrently allow for quick ratcheting behavior while still also providing some safety net to prevent suboptimal adjustment loops like this

how do we assess what buffers, queues, modules, parameters, and external influences are related to the state we’re in so as to evaluate what else to change?

fun things to ponder certainly

@alan-maguire
Copy link
Member

sorry, rereading this I misunderstood the original issue. It's not that tcp [rw]mem exceeds net.core limits; it's that we can't exceed a 2g limit in the parameter setting. I've pushed a potential fix #97 to the main branch - if you could test at your end that would be great. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants