-
Notifications
You must be signed in to change notification settings - Fork 88
Peer Sharing Implementation Plan
NOTE: Peer Sharing is a released feature. This document which was previously an implementation plan, is now a reference document of the planning and implementation process, and was updated after the release to be accurate with changes (and fixes) required for release.
- Introduction
- Plan and High Level Design
- Low Level Design
- Post-Implementation Notes
Cardano nodes and the interactions between them are combined together within a networking layer, which distributes information about transactions and block creation among all active nodes. This is known as the diffusion layer. This is how the system carries out the Ouroboros family of protocols, specifically by diffusing, validating, adding new blocks to the chain, as well as verifying transactions. Any such network of nodes must be resilient enough to cope with connectivity and node failures, adapt to capacity restrictions while seeking to minimize communication delays. In the Shelly network design, two separate flavors of connections may be identified:
- Upstream nodes provide blocks minted elsewhere in the network, by actively following the chain on those nodes;
- Downstream nodes receive blocks that are relayed from upstream nodes and those that are minted locally, by actively following the chain on this node.
Note, nodes pull information from other nodes typically by placing an outstanding request against the next piece of information. This ensures that the node has control over the amount of work it can be required to do at any one time.
It is deemed a protocol violation to forward invalid blocks. Therefore, there is a need to validate received blocks before forwarding them, which is a resource-intensive operation. All people following the chain need to have a copy of the produced blocks. Only Stake Pool Operators (SPOs) generate blocks. There is a large asymmetry between block producers (few thousand) and block consumers (hundred of thousands to millions).
To meet both the scale and the timeliness of distribution, there needs to be a large fan-out in the direction of block producers to block consumers. It is envisioned that a typical node might have 10 to 20 upstream peers as well as 50 to a 100 downstream peers.
The network topology is established iteratively by some node A requesting to become a downstream node for some other node B. This raises the question of how node A knows the address of node B in order to initiate the connection. There are three possible ways:
- By manual configuration, to ensure connectivity to designated nodes;
- By sampling from DNS names/addresses recorded on the blockchain;
- By sampling from addresses obtained from other nodes at runtime (peer sharing).
This document is about enhancing the above process, by replacing the existing peer sharing approach with a more scalable lightweight solution. This approach, when combined with eclipse evasion, provides for a scalable network while containing the operational load on SPO peers recorded in the blockchain.
Nodes in the Byron federated system were connected by a static configuration provided in a topology file. Since Shelley was introduced, the system has been operating in a hybrid state. In other words, SPO nodes can communicate with both federated relay nodes and SPO-run relay nodes. Although this connectivity is not automated, it allows for the exchange of block and transaction information without the need of federated nodes.
If only major stakeholder nodes (whose numbers are limited by economic incentives) can be upstream peers, the network's scalability could be constrained. There is clearly a limit to how many downstream peers any relay can handle, even though serving blocks to downstream peers is substantially less expensive than confirming blocks obtained from upstream peers. Network capacity can be boosted and the load lighten on SPO's relays by permitting automated connections between SPO relays and allowing non-stake-holding nodes to take part in block forwarding.
Currently, the high-level architecture of P2P is made up of four major components: the Connection Manager, the Peer Selection Governor, the Inbound Governor, and the Server. These components collaborate to control each node's outbound and inbound connections, ensuring optimal network and safety properties, resource utilization, and efficiency.
The Peer Selection Governor (P2P Governor), which is also tied with the Connection Manager, handles the automatic establishment of connections to peers, as well as monitoring and running mini-protocols as needed. It is in charge of outbound peer connection management; it determines which peers are useful for connecting to and which should be promoted or demoted. The primary goal of the P2P governor is to manage outbound connections, ensuring that the target number of cold, warm, and hot peers is met. Thus, building and maintaining a globally connected topology.
Cold peers are known but have no active outbound connection; Warm peers have an active connection (bearer) but are solely used for network measurements and not for any application level consensus protocol; and Hot peers are actively used for application level consensus protocols. These sets of peers also satisfy some other implicit purposes, such as warm peers serving as a churn set for hot peers, allowing potentially better warm peers to take over from existing hot ones, or maintaining a diversity in hop distances to aid recovery from network events that may disrupt normal network operation. Sources for these Cold, Warm and Hot peer sets come from promoting/demoting the so called Root Peers, which can be separated into two groups: Local Peers and Public Peers. Public Peers consist of both manually configured addresses and/or ledger peers. Promoting/Demoting Root Peers establishes the Known, Established and Active peer sets. More details in the image below:
All these sets ought to have targets and/or policies that the P2P Governor seeks to maintain. Targets and policies serve multiple purposes such as resource management and making sure the node can make progress towards an optimum configuration as well as safeguard the node against adversarial behavior.
As mentioned in the first section, currently, there are two ways a node can learn about other peers. When a node starts, it will look into the topology file referenced from the local configuration for root peers, i.e. either public peers, coming from high veracity sources like IOHK relays or local peers which represent peers of specific significance for this node. The existing default is that the node will only use these manually configured source of peers. Alternatively it can be configured to get peers from the ledger as well.
The P2P Governor will try to maintain the target numbers for each given set, which means it will try to: fetch more Known peers; promote a given Cold peer to Warm, if it can't fulfil its targets it will retry after some delay.
More details can be found in the Shelley Network Design document, however more relevant details will be added in this section as needed.
The aim of the Peer Sharing protocol is to facilitate the discovery of potential peers within the overall Cardano network. There is a requesting side and replying side to this process. The requesting side communicates with its Established Peers, requesting a number of addresses from the remote peer's Known Peers set. New addresses are added to the local Known Peer Set (specifically as Cold Peers). On the replying side a peer responds to a request by supplying addresses from its Known Peer set, to which it has previously established a successful connection.
- A peer has to be willing to share (as indicated in handshake)
- Manually configured addresses can be optionally shared (as recorded in configuration files)
- Learnt addresses that are obviously from ledger peers will not be shared (i.e. as derived from the chain)
This Peer Sharing process is designed to work in conjunction with Ledger Peers from the chain. There is no assumption that the Peer Sharing process provides a robust defense against sibyl/eclipse attacks. Resistance to such attacks is derived from a connection to (Big) Ledger Peers. Consequently the P2P Governor will have a target number of Ledger Peers to maintain contact with. The plan regarding Eclipse Evasion is going to be detailed in the Eclipse Evasion documented that was referenced above.
-
How to integrate the Peer Sharing into the Governor operation?
- Use the existing Peer Selection Governor or have separate structure
- Design MiniProtocol state machine
- Is simple Request-Reply enough
- Design MiniProtocol implementation
- Should request triggered by the Peer Selection Governor if not how?
- How should responses be filtered?
-
- Is asking only upstream peers sufficient?
- Should we ask Cold peers?
- Should we ask Established peers?
- More ?
-
How is the reply to a share request calculated
- How to identity peers to share?
- Should we verify they are/were contactable/online?
- Should we know about the peer's server hard limit?
- Should they be picked at random?
- Should we let others know about adversarial nodes too?
- More ?
- How to identity peers to share?
-
Node handling of shared information
- Should we have targets for shared Peers
- In what context does it make sense to perform Peer Sharing (i.e. while bootstrapping, syncing, caught up, all the time)?
- Should any type of node not perform Peer Sharing (BP, Relay, Wallet, etc..)?
- Should we churn shared peers?
- Should we have a target for hot shared peers?
- More ?
In essence there are 3 phases to Peer Sharing:
- Asking (requesting) peers
- Sharing (replying) peers
- Receiving and handling the shared peer response set
The ideal method appears to be to create a unique GitHub issue for each question, so people can discuss it and further develop the strategy in a transparent, open-source manner. This Wiki page should be updated with a brief explanation of what was discussed/decided in each topic. With that in mind, below are the issues that the networking team will need to resolve in order to implement Peer Sharing:
- Set out the arguments that support the thesis that the appropriate preconditions are understood and in place so that Peer Sharing can work
- Which peers do we ask to
- How is the reply to a share request calculated
- Node handling of shared information
For now Peer Sharing is being idealized as a Request-Response type of protocol, that will aid the node obtaining more known peers.
The initial stage of the Peer Sharing protocol. The Peer Selection Governor should determine when a node should perform Peer Sharing. Currently, the Peer Selection Governor's legacy sharing mechanism will consider the target number of Known Peers and some rate limit of share requests variable to decide when to ask for peers. We can reuse this, however there may be additional conditions, such as:
- Is Peer Sharing enabled?
- Are we in Bulk-Sync?
The next step is to choose which peers to ask to since the Peer Selection Governor already
provides a method for deciding how many peers need fetching (old system depends on
policyMaxInProgressGossipReqs
and policyPickKnownPeersForGossip
variables). To make
share requests, we need to know which peers are available, e.g. positive willingness
values configured. We can obtain this information through configuration files or
handshake. Changes to the Handshake and Node Configuration and Topology files
are implied by this.
We know which peers are available to ask based on their willingness information. Only established peers should be asked (i.e. start a request-response protocol), as their valency is sufficiently high (if, in the future, we decide that we do need ask cold peers, we can make them 'warm' anyway).
With this resolved, all that remains is to select a random set of peers from the established to-ask set, and a share request will be sent to all of the selected peers. It should be noted that the present legacy sharing mechanism will utilise the target number of Known Peers to decide when to ask for peers; ideally, it will ask for enough peers to make the node meet this target, therefore we should divide the number of peers requested by the number of to-ask peers sampled.
It should be noted that the protocol should establish a global maximum number of peers that can be requested on the client side, so that we can protect ourselves against malicious nodes that try to OOM nodes by responding with GB worth of peers. This limit should most likely be determined by the target value.
The replying side of the Peer Sharing protocol merely requires us to choose which peers to share with the requesting side. The request includes an upper limit on the maximum number of peers requested by the node. We don't need to know if we've recently answered to this peer because share requests should have a reasonable retry delay for each peer.
We should only share peers that:
- are not known-to-be-ledger peers;
- we managed to connect-to at some point.
- are advertisable (as per the local configuration)
This implies that the node must keep track of which peers are ledger peers, that root peers must be properly configured with advertising flags, in order to prevent the possibility of disclosing sensitive information, and that every time we have successfully establish a connection with a peer we tag it accordingly. The to-share set should be picked at random.
It should be noted that there will be a limit to how large a response can be, thus the server must not provide more data than that. So, even if the client requests 100000 addresses, the client will only receive, say, 50. (if only that many addresses will fit into the limit).
After receiving the result set, it was considered to conduct some sort of peer validation, such as confirming the addresses are indeed contactable, in order to prevent the spread of incorrect addresses through Peer Sharing. Saying this, we are aware that there are a certain adversarial behaviors that could potentially take advantage of the Peer Sharing protocol. For now the design followed the simplest approach, since it does not have any critical performance objectives and rates for convergence/divergence can be very slow, further more we already have mechanisms to slow down the impact of such adversarial behavior in our P2P stack. There are of course other ideas such as:
- Keep track of who informed us about which peer, and if we see that peer gave us bad addresses, further extend the timeout period before we may ask that peer for more addresses;
However we deemed this not being worth to implement in the first iteration.
As described in this section, a node's configuration files will require a new set of flags. These flags indicate a node's desire to participate in Peer Sharing.
The 2 edge cases of a node type are: Block Producer and Wallet - the normal case being a Relay node, these are each node type view on Peer Sharing:
- Relay nodes should have no problem participating in Peer Sharing and its address being forwarded to other nodes;
- Block Producers should not be known (that's why they should always be behind relays), so they can't participate in Peer Sharing;
- Wallet addresses are not very useful for Relays but it is useful for Wallets to participate on the network and know more addresses, hence they should participate on Peer Sharing.
With these use cases in mind a new flag in the node configuration file should be added, allowing the user to specify the following options:
-
PeerSharingDisabled
- Peer Sharing is disabled globally -
PeerSharingEnabled
- Peer Sharing is enabled
Another use case is when a node indicates in its topology file that it wants to engage in Peer Sharing but does not want to share about a specific configured peer. For this there is already an "advertise" flag available for this purpose, which can let you know whether or not it is appropriate to share any information about this address.
NOTE: This section got updated in terms of Bug fixes and Design changes.
- One of the topics that was also discussed for necessary future work was caching known peers so that a node can recover more rapidly across reboots/failures. For this a node could serialize its Known Set to disk so it could be reinitialised as soon as it starts.
- Record information about the effectiveness of Peer Sharing and associated analysis (service assurance)
The Peer Sharing MiniProtocol will be a simple Request-Reply protocol. Peer Sharing Protocol is used by nodes to perform share requests to upstream peers. Requested peers will share a subset of their Known Peers.
Following the Shelley Networking Protocol document, it should be easy enough to re-use the already existing one to our fit:
Protocol Messages (note that this should be refine from the Request-Response protocol above):
-
MsgShareRequest amount
: The client requests a maximum number of peers to be shared (amount). Ideally this amount should limited by a protocol level constant to disallow a bad actor from requesting too many peers. -
MsgSharePeers [peerAddress]
: The server replies with a set of peers. Ideally the amount of information (e.g. reply byte size) should be limited by a protocol level constant to disallow a bad actor from sending too much information. -
MsgDone
: Terminating Message.
Transition Table | |||
---|---|---|---|
From State | Message |
Parameters | To State |
StIdle | MsgShareRequest | amount | StBusy |
StBusy | MsgSharePeers | [peerAddress] | StIdle |
StIdle | MsgDone | StDone |
The initiator side will have to be running indefinitely since protocol termination means
either an error or peer demotion. Because of this, the protocol won't be able to be run as
a simple request-response protocol. To overcome this the client side implementation will
use a registry so that each connected peer gets registered and assigned a controller with
a request mailbox. This controller will be used to issue requests to the client
implementation which will be waiting for the queue to be filled up to send a
MsgShareRequest
. After sending a request, the result is put into a local result mailbox.
If a peer gets disconnected, it should get unregistered.
First of all peer sharing requests should only be issued if:
- The current number of known peers is less than the target for known peers;
- The rate limit value for peer sharing requests isn't exceeded;
- There are available peers to issue requests too;
If these conditions hold then we can pick a set of peers to issue requests to. Ideally this set respects the rate limit value for peer sharing requests.
If a peer has PeerSharingDisabled
flag value do not ask it for peers. This peer
won't even have the Peer Sharing MiniProtocol server running.
The amount of peers to request to each upstream peer should aim to fullfill the target for known peers. This number should be split for the current peer target objective across all peer sharing candidates for efficiency and diversity reasons.
Apart from managing the Outbound Governor state correctly, the final result set should be a random distribution of the original set.
This selection should be done in such a way that when the same initial PRNG state is used, the selected set does not significantly vary with small perturbations in the set of published peers.
The intention of this selection method is that the selection should give approximately the same replies to the same peers over the course of multiple requests from the same peer. This is to deliberately slow the rate at which peers can discover and map out the entire network.
As soon as the server receives a share request it needs to pick subset not bigger than the value specified in the request's parameter. The reply set needs to be sampled randomly from the Known Peer set according to the following constraints:
- Only pick peers that we managed to connect-to at some point
- Pick not known-to-be-ledger peers
- Pick peers that have a public willingness information (e.g.
DoAdvertisePeer
). - Pick peers that haven't behaved badly (e.g.
PeerFailCount == 0
)
Computing the result (i.e. random sampling of available peers) needs access to the
PeerSelectionState
which is specific to the peerSelectionGovernorLoop
. However when
initializing the server side of the protocol we have to provide the result computing
function early in the consensus side. This means we will have to find a way to delay the
function application all the way to diffusion and share the relevant parts of
PeerSelectionState
with this function via a TVar.
;
; Peer Sharing MiniProtocol
;
peerSharingMessage = msgShareRequest
/ msgSharePeers
/ msgDone
msgShareRequest = [0, byte]
msgSharePeers = [1, peerAddresses]
msgDone = [2]
peerAddresses = [* peerAddress]
byte = 0..255
peerAddress = [0, word32, portNumber] ; ipv4 + portNumber
/ [1, word32, word32, word32, word32, portNumber] ; ipv6 + portNumber
portNumber = word16
As mentioned in section Node Configuration and Topology Changes, the node configuration file will need a new flag. This flag will indicate a node's desire to participate in Peer Sharing. Given this is going to be necessary:
- Add a new configuration option (in
cardano-node/../Configuration/POM.hs)
called
PeerSharing
with 2 possible values:PeerSharingDisabled
,PeerSharingEnabled
- Propagate this change all the way to the Peer Selection Governor.
- Track
PeerAdvertise
in public roots, i.e. propagate this from topology files all the way to RootPeersDNS.hs- This should be done by resolving the domain name and tag all resolved IPs with the
configured
advertise
value
- This should be done by resolving the domain name and tag all resolved IPs with the
configured
- Update documentation files
- If P2P flag is disabled then ignore the
PeerSharing
flag overwriting it toPeerSharingDisabled
The handshake mini protocol is a generic protocol that can negotiate any kind protocol
parameters. It only assumes that protocol parameters can be encoded to, and decoded from,
CBOR terms. Given this one just needs to add PeerSharing
flag values to the codec as an
extra protocol parameter. This will require:
- Adding CBOR encoder/decoder for
PeerSharing
type - Add a new NodeToNode version
- Extend Handshake protocol to accommodate this extra protocol parameter
- Change the
nodeToNodeCodecCBORTerm
function to deal with this new protocol parameter. A simple solution would be to populate the missing parameter withPeerSharingDisabled
by default.
As mentioned the Peer Selection Governor already has implemented most of the decision mechanisms to perform Peer Sharing. However, this implementation is set to ask Known Peers and we want to change it to Established Peers. Known Peers know nothing about Established Peers so this will require some work and refactoring. Also, the whole testing infrastructure has this particular detail in mind, so one would also have to change the test suite to make sure the refactor is successful.
When receiving the reply to the issued share request one needs to filter the response set against the known-to-be-ledger peers before adding to the Known Peers set, to make sure we don't add any ledger peers.
To summarize the low level design decisions for the Peer Selection Governor consist:
- Change the Known Peers
belowTarget
Peer Selection Governor action:- Only ask Established Peers
- Only ask Peers with a peer with an advertise value of either
DoAdvertisePeer
- Keep the other already builtin metrics (such as not asking the same peer twice too often, etc...)
- If local peer
PeerSharing
value isPeerSharingDisabled
, meaning Peer Sharing is disabled, no Peer Sharing requests should be issued.
For the change above, moving some of the infrastructure from PeerSelection/KnowPeers.hs to PeerSelection/EstablishedPeers.hs will be needed, as well as refactoring all the associated tests.
Finding a way to adapt
jobPhase2
to include a check for ledger peers (This requires Changes to Known Peers) will also be needed.
Known Peers will need to be extended with extra information in order to implement Peer Sharing. As already could be inferred from the sections above, Known Peers will need to track:
- Peer advertise information
- If they come from ledger
- If at some point we managed to connect to it.
There might be tasks that can be done in parallel but I'll try to come up with a sequential order that tries to optimize for dependencies:
- Changes to Known Peers
-
Changes to Peer Selection Governor
- Refactor
- Include Peer Sharing changes
- Changes to Handshake
- Peer Sharing MiniProtocol
-
Changes to Configuration Files (Needs to change
cardano-node
)
This document has already been updated considering these changes. This section is left for reference.
Here's the main peer sharing implementation PR: #4019
Here's a related PR that implements light peer sharing, a way for inbound connections to be made known to the peer selection governor:
Here's a PR that adds Peer Sharing protocol to wireshark dissector:
After having found the following bug: #4642. The team went through an extensive discussion about how one could simplify the current design to both fix and mitigate problems like this one.
We ended up noticing that there is no real need for PeerSharingPrivate
. The use case we
had in mind (see Node Configuration and Topology
Changes) is not really worth the added
complexity, but really what made us remove this flag option was the fact that it can not
really be enforced on the remote side of the protocol and there's no way to punish bad
actors. There's still a way for an user to not share an address via the AdvertisePeer
flag on the local roots configuration.
In a nutshell #4644
removes the PeerSharingPrivate
flag and greatly simplifies the handshake logic making it
truly symmetric.