-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport: Bring litep2p fixes and latest version to stable2409 #6497
Conversation
This release introduces several new features, improvements, and fixes to the litep2p library. Key updates include enhanced error handling, configurable connection limits, and a new API for managing public addresses. For a detailed set of changes, see [litep2p changelog](https://github.com/paritytech/litep2p/blob/master/CHANGELOG.md#070---2024-09-05). This PR makes use of: - connection limits to optimize network throughput - better errors that are propagated to substrate metrics - public addresses API to report healthy addresses to the Identify protocol Measuring warp sync time is a bit inaccurate since the network is not deterministic and we might end up using faster peers (peers with more resources to handle our requests). However, I did not see warp sync times of 16 minutes, instead, they are roughly stabilized between 8 and 10 minutes. For measuring warp-sync time, I've used [sub-trige-logs](https://github.com/lexnv/sub-triage-logs/?tab=readme-ov-file#warp-time) Phase | Time -|- Warp | 426.999999919s State | 99.000000555s Total | 526.000000474s Phase | Time -|- Warp | 731.999999837s State | 71.000000882s Total | 803.000000719s Closes: #4986 After exposing the `litep2p::public_addresses` interface, we can report to litep2p confirmed external addresses. This should mitigate or at least improve: #4925. Will keep the issue around to confirm this. We are one step closer to exposing similar metrics as libp2p: #4681. cc @paritytech/networking - [x] Use public address interface to confirm addresses to identify protocol --------- Signed-off-by: Alexandru Vasile <[email protected]>
…5998) This PR ensures that the `litep2p.public_addresses()` never grows indefinitely. - effectively fixes subtle memory leaks - fixes authority DHT records being dropped due to size limits being exceeded - provides a healthier subset of public addresses to the `/identify` protocol This PR adds a new `ExternalAddressExpired` event to the litep2p/discovery process. Substrate uses an LRU `address_confirmations` bounded to 32 address entries. The oldest entry is propagated via the `ExternalAddressExpired` event when a new address is added to the list (if capacity is exceeded). The expired address is then removed from the `litep2p.public_addresses()`, effectively limiting its size to 32 entries (the size of `address_confirmations` LRU). cc @paritytech/networking @alexggh --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: Dmitry Markin <[email protected]>
This PR updates litep2p to the latest release. - `KademliaEvent::PutRecordSucess` is renamed to fix word typo - `KademliaEvent::GetProvidersSuccess` and `KademliaEvent::IncomingProvider` are needed for bootnodes on DHT work and will be utilized later - kad: Providers part 8: unit, e2e, and `libp2p` conformance tests ([#258](paritytech/litep2p#258)) - kad: Providers part 7: better types and public API, public addresses & known providers ([#246](paritytech/litep2p#246)) - kad: Providers part 6: stop providing ([#245](paritytech/litep2p#245)) - kad: Providers part 5: `GET_PROVIDERS` query ([#236](paritytech/litep2p#236)) - kad: Providers part 4: refresh local providers ([#235](paritytech/litep2p#235)) - kad: Providers part 3: publish provider records (start providing) ([#234](paritytech/litep2p#234)) - transport_service: Improve connection stability by downgrading connections on substream inactivity ([#260](paritytech/litep2p#260)) - transport: Abort canceled dial attempts for TCP, WebSocket and Quic ([#255](paritytech/litep2p#255)) - kad/executor: Add timeout for writting frames ([#277](paritytech/litep2p#277)) - kad: Avoid cloning the `KademliaMessage` and use reference for `RoutingTable::closest` ([#233](paritytech/litep2p#233)) - peer_state: Robust state machine transitions ([#251](paritytech/litep2p#251)) - address_store: Improve address tracking and add eviction algorithm ([#250](paritytech/litep2p#250)) - kad: Remove unused serde cfg ([#262](paritytech/litep2p#262)) - req-resp: Refactor to move functionality to dedicated methods ([#244](paritytech/litep2p#244)) - transport_service: Improve logs and move code from tokio::select macro ([#254](paritytech/litep2p#254)) - tcp/websocket/quic: Fix cancel memory leak ([#272](paritytech/litep2p#272)) - transport: Fix pending dials memory leak ([#271](paritytech/litep2p#271)) - ping: Fix memory leak of unremoved `pending_opens` ([#274](paritytech/litep2p#274)) - identify: Fix memory leak of unused `pending_opens` ([#273](paritytech/litep2p#273)) - kad: Fix not retrieving local records ([#221](paritytech/litep2p#221)) See release changelog for more details: https://github.com/paritytech/litep2p/releases/tag/v0.8.0 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Signed-off-by: Alexandru Vasile <[email protected]>
…mmand (#6016) Previously, when receiving the `SetReservedPeers { reserved }` all peers not in the `reserved` set were removed. This is incorrect, the intention of `SetReservedPeers` is to change the active set of reserved peers and disconnect previously reserved peers not in the new set. While at it, have added a few other improvements to make the peerset more robust: - `SetReservedPeers`: does not disconnect all peers - `SetReservedPeers`: if a reserved peer is no longer reserved, the peerset tries to move the peers to the regular set if the slots allow this move. This ensures the (now regular) peer counts towards slot allocation. - every 1 seconds: If we don't have enough connect peers, add the reserved peers to the list that the peerstore ignores. Reserved peers are already connected and the peerstore might return otherwise a reserved peer ### Next Steps - [x] More testing cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: Michal Kucharczyk <[email protected]>
…6380) This PR ensures that external addresses with different PeerIDs are not propagated to the higher layer of the network code. While at it, this ensures that libp2p only adds the `/p2p/peerid` part to the discovered address if it does not contain it already. This is a followup from: - #6298 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
#6298) This PR's main goal is to add public listen addresses to the DHT authorities records. This change improves the discoverability of validators that did not provide the `--public-addresses` flag. This PR populates the authority DHT records with public listen addresses if any. The change effectively ensures that addresses are added to the DHT record in following order: 1. Public addresses provided by CLI `--public-addresses` 2. Maximum of 4 public (global) listen addresses (if any) 3. Any external addresses discovered from the network (ie from `/identify` protocol) While at it, this PR adds the following constraints on the number of addresses: - Total number of addresses cached is bounded at 16 (increased from 10). - A maximum number of 32 addresses are published to DHT records (previously unbounded). - A maximum of 4 global listen addresses are utilized. This PR also removes the following warning: `WARNING: No public address specified, validator node may not be reachable.` ### Next Steps - [ ] deploy and monitor in versi network Closes: #6280 Part of: #5266 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: Bastian Köcher <[email protected]>
This pull request is amending an existing release. Please proceed with extreme caution,
Emergency Bypass
If you really need to bypass this check: add |
Do we need this commit for litep2p backend backporting? It seems without it the changes are contained to litep2p backend only. |
edit: Aaah I see what you mean, yes I'll revert the authority-discovery PR to contain everything to litep2p. I think it would be better to backport 0.8.1 this to |
Signed-off-by: Alexandru Vasile <[email protected]>
This PR updates the litep2p backend to version 0.8.1 from 0.8.0. - Check the [litep2p updates forum post](https://forum.polkadot.network/t/litep2p-network-backend-updates/9973/3) for performance dashboards. - Check [litep2p release notes](paritytech/litep2p#288) The v0.8.1 release includes key fixes that enhance the stability and performance of the litep2p library. The focus is on long-running stability and improvements to polling mechanisms. This issue caused long-running nodes to reject all incoming connections, impacting overall stability. Addressed a bug in the connection limits functionality that incorrectly tracked connections due for rejection. This issue caused an artificial increase in inbound peers, which were not being properly removed from the connection limit count. This fix ensures more accurate tracking and management of peer connections [#286](paritytech/litep2p#286). This release provides multiple fixes to the polling mechanism, improving how connections and events are processed: - Resolved an overflow issue in TransportContext’s polling index for streams, preventing potential crashes ([#283](paritytech/litep2p#283)). - Fixed a delay in the manager’s poll_next function that prevented immediate polling of newly added futures ([#287](paritytech/litep2p#287)). - Corrected an issue where the listener did not return Poll::Ready(None) when it was closed, ensuring proper signal handling ([#285](paritytech/litep2p#285)). - manager: Fix connection limits tracking of rejected connections ([#286](paritytech/litep2p#286)) - transport: Fix waking up on filtered events from `poll_next` ([#287](paritytech/litep2p#287)) - transports: Fix missing Poll::Ready(None) event from listener ([#285](paritytech/litep2p#285)) - manager: Avoid overflow on stream implementation for `TransportContext` ([#283](paritytech/litep2p#283)) - manager: Log when polling returns Ready(None) ([#284](paritytech/litep2p#284)) Started kusama nodes running side by side with a higher number of inbound and outbound connections (500). We previously tested with peers bounded at 50. This testing filtered out the fixes included in the latest release. With this high connection testing setup, litep2p outperforms libp2p in almost every domain, from performance to the warnings / errors encountered while operating the nodes. TLDR: this is the version we need to test on kusama validators next - Litep2p Repo | Count | Level | Triage report -|-|-|- polkadot-sdk | 409 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting. ) litep2p | 128 | warn | Refusing to add known address that corresponds to a different peer ID litep2p | 54 | warn | inbound identify substream opened for peer who doesn't exist polkadot-sdk | 7 | error | 💔 Called `on_validated_block_announce` with a bad peer ID .* polkadot-sdk | 1 | warn | ❌ Error while dialing .*: .* polkadot-sdk | 1 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Invalid justification. Banned, disconnecting. ) - Libp2p Repo | Count | Level | Triage report -|-|-|- polkadot-sdk | 1023 | warn | 💔 Ignored block \(#.* -- .*\) announcement from .* because all validation slots are occupied. polkadot-sdk | 472 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Unsupported protocol. Banned, disconnecting. ) polkadot-sdk | 379 | error | 💔 Called `on_validated_block_announce` with a bad peer ID .* polkadot-sdk | 163 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Invalid justification. Banned, disconnecting. ) polkadot-sdk | 116 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Peer disconnected with inflight after backoffs. Banned, disconnecting. ) polkadot-sdk | 83 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Same block request multiple times. Banned, disconnecting. ) polkadot-sdk | 4 | warn | Re-finalized block #.* \(.*\) in the canonical chain, current best finalized is #.* polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Genesis mismatch. Banned, disconnecting. ) polkadot-sdk | 2 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Not requested block data. Banned, disconnecting. ) polkadot-sdk | 2 | warn | Can't listen on .* because: .* polkadot-sdk | 1 | warn | ❌ Error while dialing .*: .* --------- Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Vasile <[email protected]>
Yes, makes all the sense. Let's keep 2409 untouched, as this is not a production critical bugfix or similar. |
Let's close this PR and rely on |
I've noticed that litep2p had an older version for the stable released branches.
This PR backports litep2p related changes to the latest release, since litep2p is not enabled by default it should not affect inflight testing so far:
litep2p.public_addresses()
#5998SetReservedPeers
command #6016This commit has been added for ease of cherry-picking: e606a38
cc @paritytech/networking