On deleting items from eBPF maps #550
Replies: 4 comments 4 replies
-
Hi, @gustavo-iniguez-goya , thank you for bringing up this important issue.
Unfortunately, I don't recall why I placed it at the end. Im glad you found a way to avoid an endless loop.
Are you willing to research this direction? |
Beta Was this translation helpful? Give feedback.
-
Unfortunately I don't think I have the knowledge to do it from inside eBPF. In any case, I think that we'll keep hitting maps' max capacity under extreme load, like nmapping localhost.
The problem is the gap between knowing that we have to delete n items and the action of removing them. While we're deleting items, new connections may be added that fill up the maps, so in the next iteration the maps are full or almost full again:
Even deleting 11k items every second is not enough to not fill up the maps under extreme load, so maybe increasing the maps' maximum capacity would help here, but I don't know to what value (24k, 50k, 100k?). As far as I can tell, the solution I've found works a little bit better. Even if we fill up the maps, they're deleted on demand as expected and keeps working normally. But I want to test it this week in the same scenario that I reproduce the problem explained on the second paragraph of the post. |
Beta Was this translation helpful? Give feedback.
-
mmh, what if after processing an item in getPidFromEbpf() we delete that item from the map if it matches against an outgoing connection? opensnitch/daemon/procmon/ebpf/find.go Line 62 in 479b8de If it matches, from that on we don't need that item anymore, am I right?
|
Beta Was this translation helpful? Give feedback.
-
Interesting research, thanks for sharing. If you managed to fix all problem from userspace, that's even better than touching the ebpf code. |
Beta Was this translation helpful? Give feedback.
-
When eBPF maps are full, we cannot insert more connections, so if the eBPF interception method is being used, we cannot intercept new outgoing connections with this method, having to fall back to ProcFS.
$0. I reproduce this problem everyday by using some clients/daemons that connect to localhost. They produce so much traffic that after a while, eBPF stops working and the interception method falls back to ProcFS (because the maps are full I guess). After some hours, somehow the eBPF maps are cleared up and the eBPF interception method starts working again. And after a while goto $0.
So I decided to take a look at the code (it's like the 3rd time that I try to fix this problem). Some things I've seen:
Infinite loop.
opensnitch/daemon/procmon/ebpf/monitor.go
Line 148 in 479b8de
After looping the maps' items several times, one of the items in the map can be duplicated, so when it reaches this point:
opensnitch/daemon/procmon/ebpf/monitor.go
Lines 169 to 170 in 479b8de
it continues with the next iteration with the same nextKey (because
err == nil
butok
is false), leading to an infinite loop. This problem can be fixed by checking if the looked up key is ok:opensnitch/daemon/procmon/ebpf/monitor.go
Line 153 in 479b8de
However the check (if !ok {}) is at the end of all other operations. @themighty1 why was the reason to put it at the end? did you have any problem if it was right after
monitor.go:L153
?The code doesn't take into account last purged items, it's a fixed value (connections_counter - 5000). On the other hand, the connections counter doesn't reflect the items in the map.
opensnitch/daemon/procmon/ebpf/monitor.go
Lines 33 to 34 in 479b8de
Many times the purged items is random (489, 567, 4, 1029...). The Problem I've seen is that sometimes the purged items are only 4 for a long period of time, causing the maps to end up filling up, thus having to fall back to ProcFS.
After trying several things, the only way I've found to reliable delete items from the maps and avoid hitting the max capacity, is by counting the items in the maps. Then deleting half of the total items. This way we adapt to the maps' size.
One thing to bear in mind is not to empty the map. Apparently if we delete all the items, it can enter into a state that only 4 items are deleted from that on, ending up again filling up the maps. I don't have an explanation for this behaviour.
All these problems can be easily reproduced by scanning localhost with nmap.:
nmap -sT -p1-65535 localhost
Note: at least once, I've seen that one of the eBPF maps was stucked, not accepting more items, even with total items counting 456 (far less than the maximum capacity of 12k elements).
I'd be grateful if someone could reproduce these problems and test the attached solution. Note that there're plenty of debug messages, it's not the final version:
monitor.go.txt ->
$ cp monitor.go.txt opensnitch/daemon/procmon/ebpf/monitor.go
Beta Was this translation helpful? Give feedback.
All reactions