From 25abd6d37e8407e895c390a357198d1433a7b9eb Mon Sep 17 00:00:00 2001 From: Will Scott Date: Sat, 12 Nov 2022 10:32:53 +0100 Subject: [PATCH 01/13] Propose a spec change for automatic discovery of content routers. This follows the previously circulated proposal outline at https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg A basic motivation is included in the PR - but essentially this is the best path I've heard for reducing our dependence on hydras as a centrally operated choke point for moving the bulk of the IPFS network beyond sole reliance on the current KAD DHT. --- IPIP/0000-content-router-discovery.md | 250 ++++++++++++++++++++++++++ 1 file changed, 250 insertions(+) create mode 100644 IPIP/0000-content-router-discovery.md diff --git a/IPIP/0000-content-router-discovery.md b/IPIP/0000-content-router-discovery.md new file mode 100644 index 000000000..635c884c4 --- /dev/null +++ b/IPIP/0000-content-router-discovery.md @@ -0,0 +1,250 @@ +# IPIP 0000: Content Router Ambient Discovery + + + +- Start Date: 2022-11-11 +- Related Issues: + - https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg + - https://github.com/ipfs/kubo/issues/9150 + - https://github.com/filecoin-project/storetheindex/issues/823 + +## Summary + +The Interplanetary stack has slowly opened itself to support extensibility of +the content routing subsystem. This extensibility is used today by network +indexers, like https://cid.contact/, to bridge content from large providers +that cannot practically provide all content to the IPFS DHT. A missing piece +of this story is that there is not a process by which IPFS nodes can discover +these alernative content routing systems automatically. This IPIP proposes +a mechanism by which IPFS nodes can discover and make use of content routing +systems. + +## Motivation + +There is currently not a process by which IPFS nodes can discover alernative +content routing systems automatically. This has led to a reliance on +centralized systems, like the hydra boosters, to fill the gap and offer +content only available in network indexer to current IPFS nodes. This strategy +is also insufficient long term because: +1. It limits speed to the use of a globally distributed kademlia DHT +2. It is insufficient for providing content in applications where content grows + super-linearly to peers, such that the burden on a traditional DHT would + become unsustainable. + + +## Detailed design + +### 0. content-router discovery state tracking + +Nodes will conceptually track a registry about known content routers. +This registry will be able to understand for a given content router two +properties: +* reliability - how many good vs bad responses has this router responded +with. This statistic should be windowed, such that the client can calculate +it in terms of the last week or month. +* performance - how quickly does this router respond. + +This protocol expects nodes to be able to keep reliability (a metric +capturing both availability and correctness) separate from performance +for the purpose of propagating content routing information. + +In addtion, nodes may wish to track the most recent time they have learned +content routing information from the other peers they are and have been +connected with. + +### 1. content-routing as a libp2p protocol + +IPFS nodes will advertise and coordinate discover of content routers using a +new libp2p protocol advertised as "". + +The protocol will follow a request-response model. +A node will open a a stream on the protocol when it wants to discover new +content routers it does not already know. +It will send a bloom filter as it's query. +* The size of the bloom filter is chosen by the client, and is sized such +that it receives a greater than 99% certainly that it receives a useful +response. The maximum size of a query may be capped by the server, but can be +effectively considered to be under 10kb. +* The client will hash it's known content routers into the bloom filter +to set bits in the filter at the locations to which these known routers +hash. +* The server will have a parameter for a number of servers it wants to return +to content routing queries. By default this will be 10. (This default is picked +as the result of modeling router propagation). It will iterate through it's +list of known content routers, hashing htem against the bloom filter and +selecting the top routers that are not already known to the client. It will +return this list, along with it's reliability score for each. This response +is structured as an IPLD list lists, conceptually: +```json +[ + ["https://cid.contact/", 0.95], + ["https://dev.cid.contact/", 0.90], +] +``` + +### 2. probing of the discovery protocol + +A node will probe it's connected peers for content routing updates in two +situations: + +1. When it needs to perform a content routing query, and has not +successfully performed a sync in over a day. +2. When it's auto-nat status indicates it is eligible to be a DHT server, and +it has not successfully performed a synce in over a day. + +These parameters are also set through modeling. + +To perform a probe, the node will consider the set of peers it is currently +connected to. It will order peers. The specific ordering is left to the +node, but it should strive for diversity - an example ordering would be to +rank peers by how recently a content routing discovery query has been make +to that peer, with tie breaking preference for LAN nodes and for boostrap +nodes. + +### 3. selection of routers + +Nodes are free to make content routing queries across content routing +systems they are aware of as they wish. An example strategy balancing +user experience and discovery is described. + +The node maintains two thresholds: +* good (reliability > 99%, performance < 100ms) +* uncertain (queries < 5) + +Content routers meeting the good reliability threshold are ordered by +perforamnce. the top one is queried, as is an 'uncertain' router if +one exists. + +These threshold values are maintained for a year for the purposes +of local selection. +They are maintained for a month for the purpose of admitting +knowledge of routers to others - so a client will no longer set bits for +routers it is aware of but which do not meet it's threshold for 'good' +after a month. If peers then subseuqently respond with these nodes +on discovery probes, the local node may use that to consider the +node as again 'uncertain' and attempt additional probes against it less than +a year later. + +Nodes which participate as DHT servers should also consider if they +are being used only in an infrastructural capacity. If they are +receiving content routing requests from other peers, but there have been +no direct requests from the node itself that can be used to move +known content routers past the 'uncertain' threshold, the node may +choose to issue content routing queries for a fraction of the DHT +lookup queries it receives as a way to maintain a more accurate +table of content routers. + +## Test fixtures + +TK is a CID currently only available through the content routing system, +and not through the IPFS DHT. This is a piece of content that can be queried +to validate the presence of alternative content routing systems. + +## Design rationale + +As expressed in the motivation section, we need to design a system through +which nodes can discover content routers without a centralized point of +failure, and can use these routers to improve user performance for content +routing to levels faster than the current DHT. + +This design is self-contained - it does not require standing up additional +infrastructure or making additional connections for discovery but rather +gossips routers over existing peer connections. + +The design limits the ability of an adversary to impact user experience: +1. it does not propose at this stage to replace DHT queries, but only to +supplement them with content routing queries, which minimized user +noticable impact. +2. nodes will only propogate content routers they believe to work, +limiting the spread of spam / unavailable content routers to the directly +connected peers of an adversary. + +With the exception of LAN tables, the other connections made by IPFS +nodes do not have geographic locality. As a result, performance is +separated in the tracking of content routers because it will not be +effective as a ranking factor in the non-geographically-aware +gossip system described here. As an optimization, nodes may choose to +prioritize 'fast' content routers when responding to queries from peers +where sharded latency observations may be relevant. For example: +* Peers on the local LAN +* Peers in the local /16 IPv4 subnet +* Peers with observed latency less than 25ms + +### User benefit + +Users will benefit from faster discovery of content providers. +Users will also benefit from access to more CIDs than they currently do through +queries limited to the IPFS DHT + +### Compatibility + +Nodes which do not upgrade to support this IPIP will be limited to the sub-set of +content available in the DHT. this will potentially degrade over time as more +large providers limit their publishing per the IPNI ingestion protocol. + +Nodes may limit their complexity through a hard-coded list of known content +routers, essentially limiting their implementation to design section 3 of this +IPIP. In doing so, they may limit their risk of exposure to malicious parties. +They risk being out of date and to offer sub-optimal performance through their +failure to discover additional near-by content routing instances. + +### Security + +TODO: this section provides a rough sketch of arguments, but has not been fully +developed into prose at this time. At present, it is most useful for +comments and suggestions of other security considerations that should be +included as this draft develops. + +#### 1. Malicious Content Routers +##### a. Providing Bad Content Routing Records + +* records under double hashing are signed, so can't provide a record for a real peer +* if you provide non-working records, you are down-ranked + +##### b. Availability Attacks / failing to provide records + +* if list of records insufficient, client will get more from other providers in subsequent queries, leading to downranking + +#### 2. Exposure of IPFS Clients (enumeration of network participants) + +* a new provider is only visible to directly connected peers. they only forward it to peers asking them if it meets their bar +for reliability. This means propogation through the network is only posisble for routers that behave correctly. +* because clients only propogate their 'top' routers, latency is also relevant, and with sufficient number of routers, the would only +propogate in their local geographic area before becoming uncompetitive on latencyk + +### Alternatives + +#### Ambient discovery in the style of circuit relays + +Circuit relays are discovered ambiently by nodes during protocol enumeration. +When connecting with another libp2p node, IPFS nodes will probe +supported protocols. If they notice circut relay support at this time, they +make use of such aggregated knowledge when making connections needing the +support of relays. + +This is not considered sufficient for content routing, because most content +routers will not act as general peers within the IPFS mesh, so they would +not be directly discovered. Instead, the gossip discovery protocol is +ambiently discovered in much the same way as circuit relays. + +#### Advertisement in the DHT + +This suffers from one of two problems depending on tuning: Either it results in +a global list that all clients see new providers, or it takes an inordinant +amount of querying before a client happens to run into a provider, leading to +degraded experiences for most clients. The single global list that a provider +can automatically add itself to leads to issues for how to mitigate an +enumeration of all network participants by a malicious content router. + +#### Static list of known routers distributed with IPFS clients + +This has worked for the current IPFS bootstrap node, but leads to the need for +policies around how to decide which content routers will be included in such a +list, and fails to evolve efficiently as new content routers are added to the +system. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 25855784967eba053bfdec0580ce5bf0d9c44fcf Mon Sep 17 00:00:00 2001 From: Will Date: Sun, 13 Nov 2022 14:14:59 +0000 Subject: [PATCH 02/13] Update IPIP/0000-content-router-discovery.md Co-authored-by: Max Inden --- IPIP/0000-content-router-discovery.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/0000-content-router-discovery.md b/IPIP/0000-content-router-discovery.md index 635c884c4..5071ed67f 100644 --- a/IPIP/0000-content-router-discovery.md +++ b/IPIP/0000-content-router-discovery.md @@ -73,7 +73,7 @@ hash. * The server will have a parameter for a number of servers it wants to return to content routing queries. By default this will be 10. (This default is picked as the result of modeling router propagation). It will iterate through it's -list of known content routers, hashing htem against the bloom filter and +list of known content routers, hashing them against the bloom filter and selecting the top routers that are not already known to the client. It will return this list, along with it's reliability score for each. This response is structured as an IPLD list lists, conceptually: From 1202cdf59510c9b29502fa6729ee73c59d3763b6 Mon Sep 17 00:00:00 2001 From: Will Scott Date: Tue, 15 Nov 2022 10:10:42 +0100 Subject: [PATCH 03/13] update with ipip number --- ...router-discovery.md => 0342-content-router-discovery.md} | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) rename IPIP/{0000-content-router-discovery.md => 0342-content-router-discovery.md} (97%) diff --git a/IPIP/0000-content-router-discovery.md b/IPIP/0342-content-router-discovery.md similarity index 97% rename from IPIP/0000-content-router-discovery.md rename to IPIP/0342-content-router-discovery.md index 635c884c4..8859ab1da 100644 --- a/IPIP/0000-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -1,8 +1,4 @@ -# IPIP 0000: Content Router Ambient Discovery - - +# IPIP 0342: Content Router Ambient Discovery - Start Date: 2022-11-11 - Related Issues: From 07710d578703b969aff570d393b98db865db29db Mon Sep 17 00:00:00 2001 From: Will Scott Date: Thu, 17 Nov 2022 16:44:05 +0100 Subject: [PATCH 04/13] Some updates in response to @ajnavarro's review --- IPIP/0342-content-router-discovery.md | 41 ++++++++++++++++++++------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index 3d9c06a1e..dc86fadce 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -39,7 +39,10 @@ This registry will be able to understand for a given content router two properties: * reliability - how many good vs bad responses has this router responded with. This statistic should be windowed, such that the client can calculate -it in terms of the last week or month. +it in terms of the last week or month. This will in practice be stored as +daily buckets of successful and unsuccessful queries against a router, where +success indicates that the router was queried, and the data was subsequently +retrieved from a node returned as a provider by that router. * performance - how quickly does this router respond. This protocol expects nodes to be able to keep reliability (a metric @@ -50,29 +53,40 @@ In addtion, nodes may wish to track the most recent time they have learned content routing information from the other peers they are and have been connected with. +Conceptually, propagation of content routers will look like nodes gossiping +their knowledge of router existance to each other. Initially, we expect that +the current topology will look a bit more like a feedback loop over a +bipartite graph - where one side of the graph is the set of general purpose +IPFS nodes, and the other side are the bootstrap and core-infrastructural +nodes with high connectivity in the network. + ### 1. content-routing as a libp2p protocol IPFS nodes will advertise and coordinate discover of content routers using a -new libp2p protocol advertised as "". +new libp2p protocol advertised as "/ipfs/content-router-discovery/1.0.0". The protocol will follow a request-response model. -A node will open a a stream on the protocol when it wants to discover new +A node will open a stream on the protocol when it wants to discover new content routers it does not already know. -It will send a bloom filter as it's query. -* The size of the bloom filter is chosen by the client, and is sized such -that it receives a greater than 99% certainly that it receives a useful +The node wants to request the best set of known content routers from it's peer +that it does not already know. The query will make use of a bloom filter to +support this prioritization without leaking the exact list of known content +routers that the client already knows. + +* The size of the bloom filter is chosen by the client. It is sized such +that it has a greater than 99% certainly that it will receive a useful response. The maximum size of a query may be capped by the server, but can be effectively considered to be under 10kb. * The client will hash it's known content routers into the bloom filter to set bits in the filter at the locations to which these known routers hash. * The server will have a parameter for a number of servers it wants to return -to content routing queries. By default this will be 10. (This default is picked -as the result of modeling router propagation). It will iterate through it's -list of known content routers, hashing them against the bloom filter and +to content discovery queries. By default this will be 10. (This default is +picked as the result of modeling router propagation). It will iterate through +it's list of known content routers, hashing them against the bloom filter and selecting the top routers that are not already known to the client. It will return this list, along with it's reliability score for each. This response -is structured as an IPLD list lists, conceptually: +is structured as a list, conceptually: ```json [ ["https://cid.contact/", 0.95], @@ -234,6 +248,13 @@ degraded experiences for most clients. The single global list that a provider can automatically add itself to leads to issues for how to mitigate an enumeration of all network participants by a malicious content router. +Pros: +* Network is already there, no need to create a new protocol to "provide" new providers instead of CIDs. +* You could potentially associate a provider with a specific root CID content. +Cons: +* Nodes cannot drop use of the DHT / other content routing options always are 'second tier'. + + #### Static list of known routers distributed with IPFS clients This has worked for the current IPFS bootstrap node, but leads to the need for From 8ac57d9ce6000089ffe7b678f3725f625da9528b Mon Sep 17 00:00:00 2001 From: Will Scott Date: Fri, 18 Nov 2022 13:04:47 +0100 Subject: [PATCH 05/13] Address code review comments --- IPIP/0342-content-router-discovery.md | 49 +++++++++++++++++++++------ 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index dc86fadce..19e1d62b4 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -32,6 +32,20 @@ is also insufficient long term because: ## Detailed design +This spec is designed for the ability of IPFS nodes to automatically discover +and make use of 'content routers'. Content routers are services which are able +to fulfill libp2p's [ContentRouting](https://github.com/libp2p/go-libp2p/blob/master/core/routing/routing.go#L26) +API. These routers currently are considered to directly support queries using +the protocols specified by +[IPIP-337](https://github.com/ipfs/specs/pulls) +and/or +[IPIP-327](https://github.com/ipfs/specs/pull/327). + +In addition, this protocol expects that content routers that may be considered +for auto-configuration/discovery by IPFS nodes will have knowledge of the +entire CID space - in other words a delegation to such a router may be +considered 'exhaustive'. + ### 0. content-router discovery state tracking Nodes will conceptually track a registry about known content routers. @@ -110,8 +124,13 @@ To perform a probe, the node will consider the set of peers it is currently connected to. It will order peers. The specific ordering is left to the node, but it should strive for diversity - an example ordering would be to rank peers by how recently a content routing discovery query has been make -to that peer, with tie breaking preference for LAN nodes and for boostrap -nodes. +to that peer, with tie breaking preference for LAN nodes and for nodes +with explicit peering agreements. + +Other factors that may be considered include: +* Reputation of the peer, including how long it has been connected and if it + has served useful content in the past. +* Latency / ping time of the peer. ### 3. selection of routers @@ -124,7 +143,7 @@ The node maintains two thresholds: * uncertain (queries < 5) Content routers meeting the good reliability threshold are ordered by -perforamnce. the top one is queried, as is an 'uncertain' router if +performance. the top one is queried, as is an 'uncertain' router if one exists. These threshold values are maintained for a year for the purposes @@ -167,7 +186,7 @@ The design limits the ability of an adversary to impact user experience: 1. it does not propose at this stage to replace DHT queries, but only to supplement them with content routing queries, which minimized user noticable impact. -2. nodes will only propogate content routers they believe to work, +2. nodes will only propagate content routers they believe to work, limiting the spread of spam / unavailable content routers to the directly connected peers of an adversary. @@ -184,20 +203,28 @@ where sharded latency observations may be relevant. For example: ### User benefit -Users will benefit from faster discovery of content providers. -Users will also benefit from access to more CIDs than they currently do through +- Users will benefit from faster discovery of content providers. +- Users will also benefit from access to more CIDs than they currently do through queries limited to the IPFS DHT +- Router discovery and reputation mechanism improves relisience. +- IPFS user agents will not be tied to static set of hard-coded HTTP endpoints + that may stop working at any time. +- Users will benefit from replacing misbehaving (censorship, DoS, hardware + failure) routers with useful ones without having to upgrade their software. + ### Compatibility Nodes which do not upgrade to support this IPIP will be limited to the sub-set of content available in the DHT. this will potentially degrade over time as more -large providers limit their publishing per the IPNI ingestion protocol. +large providers limit their publishing per the [IPNI](https://github.com/ipni) +ingestion protocol. Nodes may limit their complexity through a hard-coded list of known content routers, essentially limiting their implementation to design section 3 of this -IPIP. In doing so, they may limit their risk of exposure to malicious parties. -They risk being out of date and to offer sub-optimal performance through their +IPIP. This comes at a price: (1) hard-coded routers become easy targets +for denial of service attacks, decreasing the resilliency of the entire setup; +(2) nodes risk being out of date and to offer sub-optimal performance through their failure to discover additional near-by content routing instances. ### Security @@ -221,8 +248,8 @@ included as this draft develops. * a new provider is only visible to directly connected peers. they only forward it to peers asking them if it meets their bar for reliability. This means propogation through the network is only posisble for routers that behave correctly. -* because clients only propogate their 'top' routers, latency is also relevant, and with sufficient number of routers, the would only -propogate in their local geographic area before becoming uncompetitive on latencyk +* because clients only propagate their 'top' routers, latency is also relevant, and with sufficient number of routers, the would only +propagate in their local geographic area before becoming uncompetitive on latencyk ### Alternatives From e558e66996b4fcd67be305f24f0aea1cd90d3e3e Mon Sep 17 00:00:00 2001 From: Will Scott Date: Sun, 20 Nov 2022 17:12:55 +0100 Subject: [PATCH 06/13] further elaboration of protocol. Clarify generality potential of protocol --- IPIP/0342-content-router-discovery.md | 68 ++++++++++++++++++++++----- 1 file changed, 57 insertions(+), 11 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index 19e1d62b4..bc805c7b9 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -17,6 +17,14 @@ these alernative content routing systems automatically. This IPIP proposes a mechanism by which IPFS nodes can discover and make use of content routing systems. +The mechanism proposed by this IPIP, where nodes gossip preferred routers +to their connected peers, can also have broader applications. The same +mechnism could be used for external IPNS, peer routers, relays, or DNS +resolvers. We point out the label allowing re-use of this mechanism for +other systems in the (protocol design)[#1-content-routing-as-a-libp2p-protocol], +but otherwise leave the concerete design for other systems to subsequent +IPIPs. + ## Motivation There is currently not a process by which IPFS nodes can discover alernative @@ -34,10 +42,10 @@ is also insufficient long term because: This spec is designed for the ability of IPFS nodes to automatically discover and make use of 'content routers'. Content routers are services which are able -to fulfill libp2p's [ContentRouting](https://github.com/libp2p/go-libp2p/blob/master/core/routing/routing.go#L26) +to fulfill IPFS's [ContentRouting](https://github.com/libp2p/go-libp2p/blob/master/core/routing/routing.go#L26) API. These routers currently are considered to directly support queries using the protocols specified by -[IPIP-337](https://github.com/ipfs/specs/pulls) +[IPIP-337](https://github.com/ipfs/specs/pulls/337) and/or [IPIP-327](https://github.com/ipfs/specs/pull/327). @@ -77,15 +85,15 @@ nodes with high connectivity in the network. ### 1. content-routing as a libp2p protocol IPFS nodes will advertise and coordinate discover of content routers using a -new libp2p protocol advertised as "/ipfs/content-router-discovery/1.0.0". +new libp2p protocol advertised as "/ipfs/router-discovery/1.0.0". The protocol will follow a request-response model. A node will open a stream on the protocol when it wants to discover new content routers it does not already know. -The node wants to request the best set of known content routers from it's peer -that it does not already know. The query will make use of a bloom filter to -support this prioritization without leaking the exact list of known content -routers that the client already knows. +The node will request routers from the peer that it does not already know. +To express what it does know, it will query with a bloom filter. The +statistical data structure provides a minimal amount of deniability around +the routers that the client already knows. * The size of the bloom filter is chosen by the client. It is sized such that it has a greater than 99% certainly that it will receive a useful @@ -99,12 +107,50 @@ to content discovery queries. By default this will be 10. (This default is picked as the result of modeling router propagation). It will iterate through it's list of known content routers, hashing them against the bloom filter and selecting the top routers that are not already known to the client. It will -return this list, along with it's reliability score for each. This response -is structured as a list, conceptually: +return this list, along with it's reliability score for each. + +#### protocol messages + +Protocol messages are encoded using *cbor*. The following protocol examples demonstrate +the schemas of requests and responses if they were to be encoded with JSON. + +A query on the "/ipfs/router-discovery/1.0.0" protocol will look like: +```json +{ + "router": "string", + "filter": "bytes of the bloom filter" +} +``` + +A concrete example would be: +```json +{ + "router": "content-routing", + "filter": {"/": {"Bytes": "xhCakxnfIHbzeOjqlbZjawUKf7uvCXAkp0L5z9jF3actECFyCzriAuS1xiyhBCailtsYEwoy/hanhiIHqTZgnA=="}} +} +``` + +A response is a list of entries, which looks like: +```json +[ + { + "peer": "multiaddr.MultiAddr", + "score": float + } +] +``` + +A concrete example would be: ```json [ - ["https://cid.contact/", 0.95], - ["https://dev.cid.contact/", 0.90], + { + "peer": "/dns4/cid.contact/tcp/443/https", + "score": 0.95 + }, + { + "peer": "/dns4/dev.cid.contact/tcp/443/https", + "score": 0.90 + }, ] ``` From d79bb4068a90e634fb152e7c7378db181b8aa1c4 Mon Sep 17 00:00:00 2001 From: Will Date: Wed, 23 Nov 2022 15:57:21 +0000 Subject: [PATCH 07/13] Apply suggestions from gui Co-authored-by: Guillaume Michel - guissou --- IPIP/0342-content-router-discovery.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index bc805c7b9..2cfd5d67f 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -162,7 +162,7 @@ situations: 1. When it needs to perform a content routing query, and has not successfully performed a sync in over a day. 2. When it's auto-nat status indicates it is eligible to be a DHT server, and -it has not successfully performed a synce in over a day. +it has not successfully performed a sync in over a day. These parameters are also set through modeling. @@ -294,7 +294,7 @@ included as this draft develops. * a new provider is only visible to directly connected peers. they only forward it to peers asking them if it meets their bar for reliability. This means propogation through the network is only posisble for routers that behave correctly. -* because clients only propagate their 'top' routers, latency is also relevant, and with sufficient number of routers, the would only +* because clients only propagate their 'top' routers, latency is also relevant, and with sufficient number of routers, they would only propagate in their local geographic area before becoming uncompetitive on latencyk ### Alternatives @@ -324,6 +324,7 @@ enumeration of all network participants by a malicious content router. Pros: * Network is already there, no need to create a new protocol to "provide" new providers instead of CIDs. * You could potentially associate a provider with a specific root CID content. + Cons: * Nodes cannot drop use of the DHT / other content routing options always are 'second tier'. From d99131a11a0b34d75c52e933a626de1328c93225 Mon Sep 17 00:00:00 2001 From: Will Date: Wed, 30 Nov 2022 16:44:24 +0000 Subject: [PATCH 08/13] Apply suggestions from code review Co-authored-by: Gus Eggert --- IPIP/0342-content-router-discovery.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index 2cfd5d67f..8e5d194ef 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -19,10 +19,10 @@ systems. The mechanism proposed by this IPIP, where nodes gossip preferred routers to their connected peers, can also have broader applications. The same -mechnism could be used for external IPNS, peer routers, relays, or DNS +mechanism could be used for external IPNS, peer routers, relays, or DNS resolvers. We point out the label allowing re-use of this mechanism for other systems in the (protocol design)[#1-content-routing-as-a-libp2p-protocol], -but otherwise leave the concerete design for other systems to subsequent +but otherwise leave the concrete design for other systems to subsequent IPIPs. ## Motivation @@ -30,7 +30,7 @@ IPIPs. There is currently not a process by which IPFS nodes can discover alernative content routing systems automatically. This has led to a reliance on centralized systems, like the hydra boosters, to fill the gap and offer -content only available in network indexer to current IPFS nodes. This strategy +content only available in network indexers to current IPFS nodes. This strategy is also insufficient long term because: 1. It limits speed to the use of a globally distributed kademlia DHT 2. It is insufficient for providing content in applications where content grows From b05960aa9122ef564b7215dd1fa9b8233b1b0a94 Mon Sep 17 00:00:00 2001 From: Will Scott Date: Fri, 2 Jun 2023 14:45:01 +0200 Subject: [PATCH 09/13] update format --- IPIP/0342-content-router-discovery.md | 36 +++++++++++++++++++-------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index 8e5d194ef..61000d700 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -1,13 +1,26 @@ -# IPIP 0342: Content Router Ambient Discovery - -- Start Date: 2022-11-11 -- Related Issues: +--- +title: "IPIP-0342: Content Router Ambient Discovery" +date: 2022-11-11 +ipip: proposal +editors: + - name: Will Scott + github: willscott + - name: Masih Derkani + github: masih +relatedIssues: - https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg - https://github.com/ipfs/kubo/issues/9150 - https://github.com/filecoin-project/storetheindex/issues/823 +order: 342 +tags: ['ipips'] +--- ## Summary +Design discovery and ranking of public contnet routers on the network. + +## Motivation + The Interplanetary stack has slowly opened itself to support extensibility of the content routing subsystem. This extensibility is used today by network indexers, like https://cid.contact/, to bridge content from large providers @@ -25,9 +38,7 @@ other systems in the (protocol design)[#1-content-routing-as-a-libp2p-protocol], but otherwise leave the concrete design for other systems to subsequent IPIPs. -## Motivation - -There is currently not a process by which IPFS nodes can discover alernative +There is currently not a process by which IPFS nodes can use alernative content routing systems automatically. This has led to a reliance on centralized systems, like the hydra boosters, to fill the gap and offer content only available in network indexers to current IPFS nodes. This strategy @@ -213,9 +224,12 @@ table of content routers. ## Test fixtures -TK is a CID currently only available through the content routing system, -and not through the IPFS DHT. This is a piece of content that can be queried -to validate the presence of alternative content routing systems. +A random CID should be generated that is only available through a +non-DHT content routing system. This is a piece of content can then be +queried to validate the presence of alternative content routing systems. + +An example service that can be used as part of testing is at +https://github.com/willscott/ipni-minimal-publisher ## Design rationale @@ -261,7 +275,7 @@ queries limited to the IPFS DHT ### Compatibility -Nodes which do not upgrade to support this IPIP will be limited to the sub-set of +Nodes which do not implment this IPIP will be limited to the sub-set of content available in the DHT. this will potentially degrade over time as more large providers limit their publishing per the [IPNI](https://github.com/ipni) ingestion protocol. From b6a36116226c20324618490c2010ad36aa43c363 Mon Sep 17 00:00:00 2001 From: Will Scott Date: Fri, 2 Jun 2023 14:50:26 +0200 Subject: [PATCH 10/13] lint --- IPIP/0342-content-router-discovery.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/IPIP/0342-content-router-discovery.md b/IPIP/0342-content-router-discovery.md index 61000d700..e5af5d1df 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/IPIP/0342-content-router-discovery.md @@ -34,7 +34,7 @@ The mechanism proposed by this IPIP, where nodes gossip preferred routers to their connected peers, can also have broader applications. The same mechanism could be used for external IPNS, peer routers, relays, or DNS resolvers. We point out the label allowing re-use of this mechanism for -other systems in the (protocol design)[#1-content-routing-as-a-libp2p-protocol], +other systems in the [protocol design](#1-content-routing-as-a-libp2p-protocol), but otherwise leave the concrete design for other systems to subsequent IPIPs. @@ -48,7 +48,6 @@ is also insufficient long term because: super-linearly to peers, such that the burden on a traditional DHT would become unsustainable. - ## Detailed design This spec is designed for the ability of IPFS nodes to automatically discover @@ -126,6 +125,7 @@ Protocol messages are encoded using *cbor*. The following protocol examples demo the schemas of requests and responses if they were to be encoded with JSON. A query on the "/ipfs/router-discovery/1.0.0" protocol will look like: + ```json { "router": "string", @@ -134,6 +134,7 @@ A query on the "/ipfs/router-discovery/1.0.0" protocol will look like: ``` A concrete example would be: + ```json { "router": "content-routing", @@ -142,6 +143,7 @@ A concrete example would be: ``` A response is a list of entries, which looks like: + ```json [ { @@ -152,6 +154,7 @@ A response is a list of entries, which looks like: ``` A concrete example would be: + ```json [ { @@ -225,7 +228,7 @@ table of content routers. ## Test fixtures A random CID should be generated that is only available through a -non-DHT content routing system. This is a piece of content can then be +non-DHT content routing system. This is a piece of content can then be queried to validate the presence of alternative content routing systems. An example service that can be used as part of testing is at @@ -266,13 +269,12 @@ where sharded latency observations may be relevant. For example: - Users will benefit from faster discovery of content providers. - Users will also benefit from access to more CIDs than they currently do through queries limited to the IPFS DHT -- Router discovery and reputation mechanism improves relisience. +- Router discovery and reputation mechanism improves relisience. - IPFS user agents will not be tied to static set of hard-coded HTTP endpoints that may stop working at any time. - Users will benefit from replacing misbehaving (censorship, DoS, hardware failure) routers with useful ones without having to upgrade their software. - ### Compatibility Nodes which do not implment this IPIP will be limited to the sub-set of @@ -282,7 +284,7 @@ ingestion protocol. Nodes may limit their complexity through a hard-coded list of known content routers, essentially limiting their implementation to design section 3 of this -IPIP. This comes at a price: (1) hard-coded routers become easy targets +IPIP. This comes at a price: (1) hard-coded routers become easy targets for denial of service attacks, decreasing the resilliency of the entire setup; (2) nodes risk being out of date and to offer sub-optimal performance through their failure to discover additional near-by content routing instances. @@ -295,6 +297,7 @@ comments and suggestions of other security considerations that should be included as this draft develops. #### 1. Malicious Content Routers + ##### a. Providing Bad Content Routing Records * records under double hashing are signed, so can't provide a record for a real peer @@ -342,7 +345,6 @@ Pros: Cons: * Nodes cannot drop use of the DHT / other content routing options always are 'second tier'. - #### Static list of known routers distributed with IPFS clients This has worked for the current IPFS bootstrap node, but leads to the need for From 8174cea3c9ec5e922715620ce7be83f1acb4deb9 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 8 Aug 2023 16:27:42 +0200 Subject: [PATCH 11/13] chore: editorial tweaks to enable HTML render --- .../ipips/ipip-0342.md | 34 +++++++++++-------- 1 file changed, 20 insertions(+), 14 deletions(-) rename IPIP/0342-content-router-discovery.md => src/ipips/ipip-0342.md (93%) diff --git a/IPIP/0342-content-router-discovery.md b/src/ipips/ipip-0342.md similarity index 93% rename from IPIP/0342-content-router-discovery.md rename to src/ipips/ipip-0342.md index e5af5d1df..cc99d1b79 100644 --- a/IPIP/0342-content-router-discovery.md +++ b/src/ipips/ipip-0342.md @@ -5,8 +5,14 @@ ipip: proposal editors: - name: Will Scott github: willscott + affiliation: + name: Protocol Labs + url: https://protocol.ai/ - name: Masih Derkani github: masih + affiliation: + name: Protocol Labs + url: https://protocol.ai/ relatedIssues: - https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg - https://github.com/ipfs/kubo/issues/9150 @@ -17,7 +23,7 @@ tags: ['ipips'] ## Summary -Design discovery and ranking of public contnet routers on the network. +Design discovery and ranking of public content routers on the network. ## Motivation @@ -26,7 +32,7 @@ the content routing subsystem. This extensibility is used today by network indexers, like https://cid.contact/, to bridge content from large providers that cannot practically provide all content to the IPFS DHT. A missing piece of this story is that there is not a process by which IPFS nodes can discover -these alernative content routing systems automatically. This IPIP proposes +these alternative content routing systems automatically. This IPIP proposes a mechanism by which IPFS nodes can discover and make use of content routing systems. @@ -38,7 +44,7 @@ other systems in the [protocol design](#1-content-routing-as-a-libp2p-protocol), but otherwise leave the concrete design for other systems to subsequent IPIPs. -There is currently not a process by which IPFS nodes can use alernative +There is currently not a process by which IPFS nodes can use alternative content routing systems automatically. This has led to a reliance on centralized systems, like the hydra boosters, to fill the gap and offer content only available in network indexers to current IPFS nodes. This strategy @@ -81,12 +87,12 @@ This protocol expects nodes to be able to keep reliability (a metric capturing both availability and correctness) separate from performance for the purpose of propagating content routing information. -In addtion, nodes may wish to track the most recent time they have learned +In addition, nodes may wish to track the most recent time they have learned content routing information from the other peers they are and have been connected with. Conceptually, propagation of content routers will look like nodes gossiping -their knowledge of router existance to each other. Initially, we expect that +their knowledge of router existence to each other. Initially, we expect that the current topology will look a bit more like a feedback loop over a bipartite graph - where one side of the graph is the set of general purpose IPFS nodes, and the other side are the bootstrap and core-infrastructural @@ -175,7 +181,7 @@ situations: 1. When it needs to perform a content routing query, and has not successfully performed a sync in over a day. -2. When it's auto-nat status indicates it is eligible to be a DHT server, and +2. When it's AutoNAT status indicates it is eligible to be a DHT server, and it has not successfully performed a sync in over a day. These parameters are also set through modeling. @@ -211,7 +217,7 @@ of local selection. They are maintained for a month for the purpose of admitting knowledge of routers to others - so a client will no longer set bits for routers it is aware of but which do not meet it's threshold for 'good' -after a month. If peers then subseuqently respond with these nodes +after a month. If peers then subsequently respond with these nodes on discovery probes, the local node may use that to consider the node as again 'uncertain' and attempt additional probes against it less than a year later. @@ -248,7 +254,7 @@ gossips routers over existing peer connections. The design limits the ability of an adversary to impact user experience: 1. it does not propose at this stage to replace DHT queries, but only to supplement them with content routing queries, which minimized user -noticable impact. +noticeable impact. 2. nodes will only propagate content routers they believe to work, limiting the spread of spam / unavailable content routers to the directly connected peers of an adversary. @@ -269,7 +275,7 @@ where sharded latency observations may be relevant. For example: - Users will benefit from faster discovery of content providers. - Users will also benefit from access to more CIDs than they currently do through queries limited to the IPFS DHT -- Router discovery and reputation mechanism improves relisience. +- Router discovery and reputation mechanism improves resilience. - IPFS user agents will not be tied to static set of hard-coded HTTP endpoints that may stop working at any time. - Users will benefit from replacing misbehaving (censorship, DoS, hardware @@ -285,7 +291,7 @@ ingestion protocol. Nodes may limit their complexity through a hard-coded list of known content routers, essentially limiting their implementation to design section 3 of this IPIP. This comes at a price: (1) hard-coded routers become easy targets -for denial of service attacks, decreasing the resilliency of the entire setup; +for denial of service attacks, decreasing the resiliency of the entire setup; (2) nodes risk being out of date and to offer sub-optimal performance through their failure to discover additional near-by content routing instances. @@ -305,14 +311,14 @@ included as this draft develops. ##### b. Availability Attacks / failing to provide records -* if list of records insufficient, client will get more from other providers in subsequent queries, leading to downranking +* if list of records insufficient, client will get more from other providers in subsequent queries, leading to down-ranking #### 2. Exposure of IPFS Clients (enumeration of network participants) * a new provider is only visible to directly connected peers. they only forward it to peers asking them if it meets their bar -for reliability. This means propogation through the network is only posisble for routers that behave correctly. +for reliability. This means propagation through the network is only possible for routers that behave correctly. * because clients only propagate their 'top' routers, latency is also relevant, and with sufficient number of routers, they would only -propagate in their local geographic area before becoming uncompetitive on latencyk +propagate in their local geographic area before becoming uncompetitive on latency. ### Alternatives @@ -320,7 +326,7 @@ propagate in their local geographic area before becoming uncompetitive on latenc Circuit relays are discovered ambiently by nodes during protocol enumeration. When connecting with another libp2p node, IPFS nodes will probe -supported protocols. If they notice circut relay support at this time, they +supported protocols. If they notice circuit relay support at this time, they make use of such aggregated knowledge when making connections needing the support of relays. From 6c837cc00da75a06874bf3d712ee00d1d1db1f7c Mon Sep 17 00:00:00 2001 From: Will Date: Tue, 19 Sep 2023 15:11:30 +0000 Subject: [PATCH 12/13] Update ipip-0342.md Explicit response "router" field expressing the content router type, and explicit example of the "ipni" use case. --- src/ipips/ipip-0342.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/ipips/ipip-0342.md b/src/ipips/ipip-0342.md index cc99d1b79..931fb5530 100644 --- a/src/ipips/ipip-0342.md +++ b/src/ipips/ipip-0342.md @@ -143,7 +143,7 @@ A concrete example would be: ```json { - "router": "content-routing", + "router": "ipni", "filter": {"/": {"Bytes": "xhCakxnfIHbzeOjqlbZjawUKf7uvCXAkp0L5z9jF3actECFyCzriAuS1xiyhBCailtsYEwoy/hanhiIHqTZgnA=="}} } ``` @@ -154,6 +154,7 @@ A response is a list of entries, which looks like: [ { "peer": "multiaddr.MultiAddr", + "router": "string", "score": float } ] @@ -165,10 +166,12 @@ A concrete example would be: [ { "peer": "/dns4/cid.contact/tcp/443/https", + "router": "ipni", "score": 0.95 }, { "peer": "/dns4/dev.cid.contact/tcp/443/https", + "router": "ipni", "score": 0.90 }, ] From 88dd4de32340d27e4598842b50b24f560b279d98 Mon Sep 17 00:00:00 2001 From: Guillaume Michel - guissou Date: Wed, 20 Sep 2023 10:33:09 +0200 Subject: [PATCH 13/13] correcting typos --- src/ipips/ipip-0342.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/ipips/ipip-0342.md b/src/ipips/ipip-0342.md index 931fb5530..e6697ce1e 100644 --- a/src/ipips/ipip-0342.md +++ b/src/ipips/ipip-0342.md @@ -115,15 +115,15 @@ the routers that the client already knows. that it has a greater than 99% certainly that it will receive a useful response. The maximum size of a query may be capped by the server, but can be effectively considered to be under 10kb. -* The client will hash it's known content routers into the bloom filter +* The client will hash its known content routers into the bloom filter to set bits in the filter at the locations to which these known routers hash. * The server will have a parameter for a number of servers it wants to return to content discovery queries. By default this will be 10. (This default is picked as the result of modeling router propagation). It will iterate through -it's list of known content routers, hashing them against the bloom filter and +its list of known content routers, hashing them against the bloom filter and selecting the top routers that are not already known to the client. It will -return this list, along with it's reliability score for each. +return this list, along with its reliability score for each. #### protocol messages @@ -179,12 +179,12 @@ A concrete example would be: ### 2. probing of the discovery protocol -A node will probe it's connected peers for content routing updates in two +A node will probe its connected peers for content routing updates in two situations: 1. When it needs to perform a content routing query, and has not successfully performed a sync in over a day. -2. When it's AutoNAT status indicates it is eligible to be a DHT server, and +2. When its AutoNAT status indicates it is eligible to be a DHT server, and it has not successfully performed a sync in over a day. These parameters are also set through modeling. @@ -219,7 +219,7 @@ These threshold values are maintained for a year for the purposes of local selection. They are maintained for a month for the purpose of admitting knowledge of routers to others - so a client will no longer set bits for -routers it is aware of but which do not meet it's threshold for 'good' +routers it is aware of but which do not meet its threshold for 'good' after a month. If peers then subsequently respond with these nodes on discovery probes, the local node may use that to consider the node as again 'uncertain' and attempt additional probes against it less than