Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transports/quic: Refactor listener handling (updated) #19

Merged
merged 32 commits into from
Aug 27, 2022

Conversation

elenaf9
Copy link

@elenaf9 elenaf9 commented Jul 10, 2022

Description

Replaces #17. Rebased onto latest kpp::libp2p-quic and merged latest libp2p::main. Apart form that, the only other change compared to #17 is to remove the Mutex in InAddr (ef7b823).

Copying over the description from #17:

Discussed in libp2p#2289 (comment).
This PR adapts the quic transport to the upstream listener changes of libp2p#2652. Additionally, it now allows the quic transport to listen on multiple addresses / endpoints. The changes include:

  • QuicTransport now owns and drives multiple endpoints. An endpoint is created each time Transport::listen_on is called.
  • When dialing a remote, a random existing listening endpoint is used. If none exists a new one is created that only supports outbound connections (i.e. server config is None). If then a new listening endpoint is created any following dials will use the new endpoint, but existing connections are not migrated.

Open Questions

  1. If a first listening endpoint is created and we already have an endpoint used for dialing (without server capabilities enabled): should we, instead of creating a new endpoint, enable server capabilities on the old endpoint and rebind it to the new listening address? The benefit of this would be that the remote peers could then update their address information once they notice the address change. For later dial attempts they would then know the correct address to dial.

Quoting response from @mxinden:

This sounds very neat to me, especially the part where the remote is notified of the address change. I am fine with this landing in a later iteration in case it is a big undertaking.

-> I will look into it and come back to this.


  1. Rare case: if the user starts listening on a socket address that we already randomly chose for an existing dialer-endpoint, what should happen?

Quoting again response from @mxinden:

I would be fine with not handling this case, that is erroring on the listen_on call, forcing the user to handle the case.

I am guessing re-using the existing dialer-endpoint is complicated, especially when handling listen requests across interfaces (i.e. 0.0.0.0) as we would need to do an additional listen_on call for the remaining interfaces with the specified port, rig

-> If we decide to add the logic mentioned in question 1. to migrate a dialing endpoint to be the new listener, we have to deal the case of one interface vs multiple anyway. So it probably makes sense to solve this together. Will look into it.

thomaseizinger and others added 25 commits June 24, 2022 08:26
…td::error::Error` (libp2p#2710)

* core/muxing: Remove `Into<io::Error>` bound from `StreamMuxer::Error`

This allows us to preserve the type information of a muxer's concrete
error as long as possible. For `StreamMuxerBox`, we leverage `io::Error`'s
capability of wrapping any error that implements `Into<Box<dyn Error>>`.

* Use `?` in `Connection::poll`

* Use `?` in `muxing::boxed::Wrap`

* Use `futures::ready!` in `muxing::boxed::Wrap`

* Fill PR number into changelog

* Put `Error + Send + Sync` bounds directly on `StreamMuxer::Error`

* Move `Send + Sync` bounds to higher layers

* Use `map_inbound_stream` helper

* Update changelog to match new implementation
…p#2726)

Print remote peer ID when seeing a second identify push stream coming in.
See e04e95a for the rational.

With tokio `v1.19.0` released, `TcStream` exposes `take_error`.

This commit applies the same fix from e04e95a to the tokio TCP provider.
…ibp2p#2731)

Bumps [styfle/cancel-workflow-action](https://github.com/styfle/cancel-workflow-action) from 0.9.1 to 0.10.0.
- [Release notes](https://github.com/styfle/cancel-workflow-action/releases)
- [Commits](styfle/cancel-workflow-action@a40b884...bb6001c)

---
updated-dependencies:
- dependency-name: styfle/cancel-workflow-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
* protocols/gossipsub: Fix a typo in error message

* protocols/gossipsub: Make the error message accurately
Allow for custom protocol ID via `GossipsubConfigBuilder::protocol_id()`.
…ect (libp2p#2744)

**Summary** of the plot of the `discover_peer_after_disconnect` test:

1. `swarm2` connects to `swarm1`.
2. `swarm2` requests an identify response from `swarm1`.
3. `swarm1` sends the response to `swarm2`.
4. `swarm2` disconnects from `swarm1`.
5. `swarm2` tries to disconnect.

**Problem**

`libp2p-identify` sets `KeepAlive::No` when it identified the remote. Thus `swarm1` might
identify` `swarm2` before `swarm2` identified `swarm1`. `swarm1` then sets `KeepAlive::No` and thus closes the
connection to `swarm2` before `swarm2` identified `swarm1`. In such case the unit test
`discover_peer_after_disconnect hangs indefinitely.

**Solution**

Add an initial delay to `swarm1` requesting an identification from `swarm2`, thus ensuring `swarm2`
is always able to identify `swarm1`.
…er (libp2p#2652)

Remove the concept of individual `Transport::Listener` streams from `Transport`.
Instead the `Transport` is polled directly via `Transport::poll`. The
`Transport` is now responsible for driving its listeners.
Adapt to the transport changes of libp2p#2652.
Note: this is only a draft "to make it work", and not a proper
implementation. It does not support listening on multiple addresses.
The listening logic with multiple Endpoints will need to be supported for
the upstream implementation.
Copy link
Author

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied over the open discussions from #17. Lmk if I forgot something.

transports/quic/src/endpoint.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/transport.rs Outdated Show resolved Hide resolved
transports/quic/src/endpoint.rs Show resolved Hide resolved
@mxinden
Copy link

mxinden commented Jul 11, 2022

@elenaf9 thanks for porting all the past discussions over here. As far as I can tell there is no input needed from my side right now. Please correct me if I am wrong. Also please let me know in case you want me to review any of these changes again.

@kpp
Copy link
Owner

kpp commented Jul 18, 2022

I tested it:
libp2p-perf: works 👍
substrate: works but same issue: server is not notified when client drops

Also one minor issue: I got Error while sending on QUIC UDP socket: Os { code: 97, kind: Uncategorized, message: "Address family not supported by protocol" } when sending data from 0.0.0.0 to ::1.

LGTM. Other issues can be solved in follow up PRs.

@mxinden
Copy link

mxinden commented Jul 19, 2022

Also one minor issue: I got Error while sending on QUIC UDP socket: Os { code: 97, kind: Uncategorized, message: "Address family not supported by protocol" } when sending data from 0.0.0.0 to ::1.

Do I understand correctly that you only had the 0.0.0.0 listen address, or also ::/0 @kpp?

@kpp
Copy link
Owner

kpp commented Jul 19, 2022

I had two nodes, one is 0.0.0.0 only, the other is ::1 only.

@elenaf9
Copy link
Author

elenaf9 commented Jul 23, 2022

Thank you for the reviews @mxinden @kpp!

Also one minor issue: I got Error while sending on QUIC UDP socket: Os { code: 97, kind: Uncategorized, message: "Address family not supported by protocol" } when sending data from 0.0.0.0 to ::1.

Fixed the logic now for selecting an endpoint for dialing:
Analogous to how socket reuse is done in the tcp transport, we now look in our existing listeners for an endpoint with matching socket family (Ipv4 / Ipv6) and loopback interface (is_loopback). If none is found we create a new dialer that is listening on a wildcard address with the matching socket family. Thus we now have two dialing endpoints, one for Ipv4 and one for Ipv6 addresses.
@kpp thanks for adding the ipv4_dial_ipv6 test! It now passes.


From my side this is ready for merge. I can not merge the latest upstream master because of some more muxer changes. @kpp I can look into fixing them, unless you are already working on it?

Edit: CI is failing because of libp2p#2770 which is independent of libp2p-quic, but came up because we use libp2p-core as dependency.

@kpp
Copy link
Owner

kpp commented Jul 24, 2022

I can look into fixing them, unless you are already working on it?

Sure. I tried but it was a little bit hard because your branch was not merged yet.

You can take this issue while I am working on that issue with disconnects.

@elenaf9
Copy link
Author

elenaf9 commented Aug 3, 2022

Sorry for the force-push. I did not change anything, just reverted the merge of upstream master. Will do a separate PR that includes the latest master and adapts the QuicMuxer.


Referencing a comment in libp2p#2722:
@kpp

I believe I should try to move QuicMuxer to a new API to give it a try.

To avoid duplicating work: are you working on this right now? From your above comment (#19 (comment)) I assumed that you are currently working on the disconnect issues, but I'm happy to sync on this in case that you have now started looking into this. Otherwise I'd draft the PR and we can discuss the solution there.

@kpp
Copy link
Owner

kpp commented Aug 3, 2022

To avoid duplicating work: are you working on this right now?

No.

@elenaf9
Copy link
Author

elenaf9 commented Aug 3, 2022

Sorry for the force-push. I did not change anything, just reverted the merge of upstream master. Will do a separate PR that includes the latest master and adapts the QuicMuxer.

Referencing a comment in libp2p#2722: @kpp

I believe I should try to move QuicMuxer to a new API to give it a try.

To avoid duplicating work: are you working on this right now? From your above comment (#19 (comment)) I assumed that you are currently working on the disconnect issues, but I'm happy to sync on this in case that you have now started looking into this. Otherwise I'd draft the PR and we can discuss the solution there.

See elenaf9#6.

@elenaf9
Copy link
Author

elenaf9 commented Aug 19, 2022

For the sake of reducing the number of parallel open PRs related to quic, and also to ease reviewing:
I think we can already review and merge this PR independently of elenaf9#6, since they target different parts of the implementation. Wdyt @kpp? Even though it does not include all of the latest master changes, I think it is still an improvement.

Additional reviews would also be appreciated (cc @mxinden @thomaseizinger @MarcoPolo; relevent is only the diff within transports/quic).

Copy link

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of merging here as is. Thanks for the ping and thanks for staying on top of things @elenaf9.

keypair: &libp2p_core::identity::Keypair,
multiaddr: Multiaddr,
) -> Result<Self, tls::ConfigError> {
pub fn new(keypair: &libp2p_core::identity::Keypair) -> Result<Self, tls::ConfigError> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 great to no longer having to pass a Multiaddr.

@kpp
Copy link
Owner

kpp commented Aug 21, 2022

I think we should merge it. I will review it today/tomorrow, wait for an additional reviews till the end of Monday and merge it.

@kpp
Copy link
Owner

kpp commented Aug 22, 2022

bench `multiple-endpoints-2#bf06d967`
# Start Rust and Golang servers.
Local peer id: PeerId("Qmcqq9TFaYbb94uwdER1BXyGfCFY4Bb1gKozxNyVvLvTSw")
about to listen on "/ip4/127.0.0.1/udp/9992/quic"
Listening on "/ip4/127.0.0.1/udp/9992/quic".

# Rust -> Rust

Local peer id: PeerId("12D3KooWAmxKDBCLbYU8juAmRhUKGhEYcg79c4hCnvENraAosWkD")
IncomingConnection { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/40500/quic" }
ConnectionEstablished { peer_id: PeerId("12D3KooWAmxKDBCLbYU8juAmRhUKGhEYcg79c4hCnvENraAosWkD"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/40500/quic" }, num_established: 1, concurrent_dial_errors: None }
Behaviour(PerfRunDone(10.000694835s, 2232061300))
Interval	Transfer	Bandwidth
0 s - 10.00 s	2232 MBytes	1785.29 MBit/s
ConnectionClosed { peer_id: PeerId("Qmcqq9TFaYbb94uwdER1BXyGfCFY4Bb1gKozxNyVvLvTSw"), endpoint: Dialer { address: "/ip4/127.0.0.1/udp/9992/quic", role_override: Dialer }, num_established: 0, cause: None }

# Rust -> Golang

Local peer id: PeerId("12D3KooWPY3YCErc99qJ74h363qjHHf93DFuuK8U5n2SPGJHooxD")
Interval	Transfer	Bandwidth
0 s - 10.00 s	2056 MBytes	1644.61 MBit/s
ConnectionClosed { peer_id: PeerId("12D3KooWL3XJ9EMCyZvmmGXL2LMiVBtrVa2BuESsJiXkSj7333Jw"), endpoint: Dialer { address: "/ip4/127.0.0.1/udp/9993/quic", role_override: Dialer }, num_established: 0, cause: None }


# Golang -> Rust

2022/08/22 15:11:51 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
IncomingConnection { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/32942/quic" }
ConnectionEstablished { peer_id: PeerId("12D3KooWEiWwtDiNQcwynr4TUzY3m1zFVJ9YDVifDb7FQBajCt9Y"), endpoint: Listener { local_addr: "/ip4/127.0.0.1/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/32942/quic" }, num_established: 1, concurrent_dial_errors: None }
Interval 	Transfer	Bandwidth
0s - 10.00 s 	1672 MBytes	1337.58 MBit/s
Behaviour(PerfRunDone(9.999930943s, 1672064000))

# Golang -> Golang

2022/08/22 15:12:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Interval 	Transfer	Bandwidth
0s - 10.00 s 	1681 MBytes	1344.74 MBit/s

@thomaseizinger
Copy link

Additional reviews would also be appreciated (cc @mxinden @thomaseizinger @MarcoPolo; relevent is only the diff within transports/quic).

I am not gonna be able to fit a review in in time but I think this has had a good few pairs of eyes. Happy to see it merged. The numbers look good too :)

@mxinden
Copy link

mxinden commented Aug 25, 2022

Friendly ping @kpp. Would you mind hitting the merge button?

@kpp
Copy link
Owner

kpp commented Aug 25, 2022

@mxinden I am not quite sure whether we have to merge it right now because of elenaf9#6 (comment).

@elenaf9
Copy link
Author

elenaf9 commented Aug 25, 2022

@mxinden I am not quite sure whether we have to merge it right now because of elenaf9#6 (comment).

I think we can already merge this PR (if you have reviewed and approved) because it does not depend on elenaf9#6. The issue about not being compatible with the go-impl that appears on that other PR is not an issue here.
We don't per-se need to already merge it, but reducing the overall number of pending PRs related to QUIC makes it easier for folks on the outside to understand what is happening where. Any reason why you would prefer to wait @kpp?

@kpp kpp merged commit e0684ab into kpp:libp2p-quic Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants