-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peer is flooded with new ingress streams #734
Comments
Is it the same |
Yes it is. Once I’m connected to more peers this happens with every peer. Also the Oid for the waiting room warning is always the same. |
I don’t know of anything which would trigger packets at a 200ms interval, except perhaps an amplification effect of something malfunctioning with the “waiting room” |
@geigerzaehler If you could conjure up debug logs from seed that correspond to the upstream peer that would help with narrowing down the interaction that is causing this. |
Does this then rule out a potential issue in the seed node/library? |
Here’s the seed log just before the issue starts. Afterwards the log basically repeats itself. |
A couple of things I noticed when I investigated this and that may contribute to the issue
|
Seems like radicle-dev/radicle-client-services#5 is related, which doesn't have a waiting room or complex behavior. It seems like peers are establishing large amounts of connections to other peers, something we've been seeing on the seed node for a while. It could either be malicious behavior, or some kind of feedback loop.. In any case it might help to limit the amount of ingress streams per-peerid, as a mitigation. |
Something I find weird about this log is that a peer keeps advertising the same
Doing |
But radicle-link recently added rate limiting, so this is even more curious 🤔 |
Rate limiting is limited due to the stream-per-message model. That was a bad idea, I'm changing it. At the end of the day, a peer flooding the network cannot really be defended against, unless we auto-ban them permanently. |
Any update on this? This issue often makes Upstream unusable :( |
I thought the last thing we talked about was Upstream updating |
Also heard from @cloudhead on matrix that it was sprout misbehaving. Did no
further action follow this statement?
|
That didn’t help unfortunately. We’re still seeing this issue.
I was also seeing this issue with setzling. And this was after I restarted it and was the only peer connected to it. |
And there was no sign of some peer repeatedly making requests? |
I thought the last thing we talked about was Upstream updating
librad to this commit [1]cb91ccd and seeing how that might improve
usage?
That didnât help unfortunately. Weâre still seeing this issue.
Yeah no, the effect of this can only kick in when most peers are running on that
version. Only then can we tighten the flow control limits to drop packets from a
logorrheic peer on the floor.
Also heard from ***@***.*** on matrix that it was sprout
misbehaving.
I was also seeing this issue with setzling. And this was after I
restarted it and was the only peer connected to it.
So, what are we seeing exactly? That `seed` nodes are flooding, or that
`upstream` nodes are flooding as soon as they are connected to a `seed`? Is it
perhaps that the `seed` should check if the project is already tracked here [0],
because the call to `track` is idempotent?
[0]: https://github.com/radicle-dev/radicle-link/blob/master/seed/src/lib.rs#L402
|
I’m not able to reproduce this issue anymore if I’m the only one connected to a seed node. But I still see this issue if other peers are connected. After adding some logging statements I see the following when I connect to sprout.
This log line is repeated every couple of milliseconds. As you can see, the message arrives from the seed but originates from another node. Could one way to limit the impact of this be to have the waiting room not react to uninteresting messages? |
The thing is that if you've requested an Why do you think this could be the waiting room anyway? 🤔 Maybe I'm missing something you see in the logs there? 👀 |
I’m not saying the waiting room causes the issue. But it seems odd to me that the messages put so much pressure on the CPU. I was just thinking about limiting the impact of messages that don’t have an effect. |
If anything this is probably the source of some unwanted traffic and we should definitely fix it. |
When calling `Node::track_project`, it might very well be that the project/peer is already tracked, which would result in unnecessary broadcast traffic. Thus, don't emit any events in this case. Also, change the order of `track` and `replicate` to ensure we do actually fetch the newly-tracked peer's view. Additionally, we can skip `replicate` if we already tracked the peer, as subsequent broadcast messages won't be considered uninteresting. Ref #734 Signed-off-by: Kim Altintop <[email protected]>
When calling `Node::track_project`, it might very well be that the project/peer is already tracked, which would result in unnecessary broadcast traffic. Thus, don't emit any events in this case. Also, change the order of `track` and `replicate` to ensure we do actually fetch the newly-tracked peer's view. Additionally, we can skip `replicate` if we already tracked the peer, as subsequent broadcast messages won't be considered uninteresting. Ref #734 Signed-off-by: Kim Altintop <[email protected]>
I must say that this waiting room logic escapes me mostly. It is probably the case that the bandaid of screaming "have" into the forest should just be removed -- unfortunately, #653 is still stalled, because replication v3 is more important right now. In the spirit of graceful centralisation, I could imagine the following workaround: since most clients ( |
When calling `Node::track_project`, it might very well be that the project/peer is already tracked, which would result in unnecessary broadcast traffic. Thus, don't emit any events in this case. Also, change the order of `track` and `replicate` to ensure we do actually fetch the newly-tracked peer's view. Additionally, we can skip `replicate` if we already tracked the peer, as subsequent broadcast messages won't be considered uninteresting. Ref #734 Signed-off-by: Kim Altintop <[email protected]>
That's kind of interesting - is this a soft-fork? |
is this a soft-fork?
I'm not sure what you mean by that.
|
Hehe I mean is it a backwards compatible change? |
Yeah sure. It would be an optional field, which should just be ignored by older
clients.
|
Don’t we have some rate limiting in place that should prevent a peer being flooded by messages from another peer? |
Donât we have some rate limiting in place that should prevent a peer
being flooded by messages from another peer?
Yes we do, and I saw in one of the earlier log captures you posted that it kicks
in. Might be worth playing with the thresholds.
|
When connecting to either
setzling
orsprout
we see the following log line every 200msIn addition, we also see the following log line for a while until it goes away (maybe after replicating a first project)
This happens even if I start Upstream with a clean slate and I don’t replicate any projects.
I’ve noticed that once my peer is connected to multiple peers I’m seeing this log line for every peer. This puts significant pressure on the CPU resulting in 50% utilization. This is a high severity issue for Upstream users since the app and their machines become unresponsive
Steps to reproduce
RAD_HOME
The text was updated successfully, but these errors were encountered: