Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to discovered peers concurrently #629

Closed
wants to merge 1 commit into from

Conversation

geigerzaehler
Copy link
Contributor

Before, connecting to a discovered peer would block the connecting to the peer discovered next. This would cause particular problems when we tried to connect to a peer that isn’t reachable and the connection would result in a timeout.

Before, connecting to a discovered peer would block the connecting to
the peer discovered next. This would cause particular problems when we
tried to connect to a peer that isn’t reachable and the connection would
result in a timeout.

Signed-off-by: Thomas Scholtes <[email protected]>
@geigerzaehler geigerzaehler requested a review from a team as a code owner April 21, 2021 13:14
@@ -28,7 +28,7 @@ where
D: futures::Stream<Item = (PeerId, Vec<SocketAddr>)>,
{
disco
.for_each(|(peer, addrs)| {
.for_each_concurrent(10, |(peer, addrs)| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how this is useful.

However, there is a catch: when we discover a new peer, we'll try to add it to the active set. Which may cause already-active peers to get demoted.

This is problematic when the disco stream is not coming from a static (bounded) set of bootstrap nodes, but for example from some periodic (m)DNS query, or some datasource which remembers a list of seen peers which is larger than the active set: the active set will constantly get replaced, causing a lot of churn. It will also converge the node to only be connected to the peers which are discovered via this stream, not the membership shuffle.

I haven't been able to come up with a good solution to this, but now that you're raising it: maybe the disco future could just go into a pending state while the membership active count (available through state.membership.view_stats().0) is equal to net::protocol::membership::Params::max_active (not currently available to this function)? When it wakes up, and the count is smaller, it could pop max_active - active from the stream and try to connect to those concurrently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also converge the node to only be connected to the peers which are discovered via this stream, not the membership shuffle.

To elaborate on why this is bad: specifically mDNS could yield many peers on the local network, so we'd converge to only be connected to the local network. Which is clearly not what we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense then to handle the connection (io::connection::connect()) concurrently but the membership sequentially? In this case unreachable peers would not block membership rpc with other peers.

The motivation for this PR is seed node selection in Upstream. Concretely, we just want to tell link to connect to a set of nodes. As an alternative to what I described above we could also provide a better API to achieve the Upstream use case. We could have a function that tells the local peer to connect to another remote node or we could change the discovery stream to impl Stream<Item = Vec<PeerId, Vec<SocketAddr>>>. (I’d prefer the former).

@kim
Copy link
Contributor

kim commented Apr 22, 2021 via email

@geigerzaehler
Copy link
Contributor Author

it just gives the caller more liberty to interfere with the membership protocol, where the problem is that it should be up to the protocol to decide when resort to external discovery.

Sorry that I didn’t make this clear but I’m wondering whether to provide an API for the case where the user (e.g. Upstream) wants the peer to connect to a specific remote now. This API would be distinct from the discovery stream (and mDNS) which would only pull a new peer if an active slot is available. With this new API it would of course be the responsibility of the caller to ensure that this does not mess things up.

Would it make sense then to handle the connection (io::connection::connect()) concurrently
I was under the impression that select_ok is effectively concurrent, but that could be wrong.

io::connection::connect() concurrently connects to all sockets of one peer with select_ok. But we also want to try to connect to multiple peers concurrently to avoid an unreachable peer blocking other connections. This would still be a problem if we only pull from the discovery stream when there’s a free slot. But maybe sequentially connecting is also okay if we set a timeout for the connection.

I’ll update the PR incorporating your suggestion. That should shed more light on it.

@kim
Copy link
Contributor

kim commented Apr 23, 2021 via email

@xla
Copy link
Contributor

xla commented Jul 2, 2021

Can we close this in favour of #723?

@geigerzaehler geigerzaehler deleted the thomas/concurrent-discover-connect branch July 7, 2021 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants