Proposal: IPFS Content Providing #31

aschmahmann · 2021-02-17T16:53:34Z

Taking a stab at a content routing proposal. cc @Stebalien @petar for some thoughts.

My take on the high level content providing issues is that ResourcesPerProvide*NumberOfProvides*ProvideFrequency is too high. Decreasing any of these is valuable and this issue focuses primarily on decreasing the number of resources required per provide and enabling our existing work on decreasing the number of things to provide (e.g. the roots Reprovider strategy).

I'm open to discussion on putting focus on other parts of the equation though.

willscott · 2021-02-17T17:00:33Z

proposals/ipfs-content-providing.md

+- Make IPFS public DHT `put`s take <3 seconds (i.e. come close to `get` performance)
+  - Some techniques available include:
+     - Decreasing DHT message timeouts to more reasonable levels
+     - [Not requiring](https://github.com/libp2p/go-libp2p-kad-dht/issues/532) the "followup" phase for puts
+     - Not requiring responses from all 20 peers before returning to the user
+     - Not requiring responses from the 3 closest peers before aborting the query (e.g. perhaps 5 of the closest 10)


having this framed as "do these things" rather than "get to these goals" will make this easier to scope / make it feel more concrete

Are you referring to just Make IPFS public DHT puts take <3 seconds or more of this section? The "take <3 seconds" part is mostly because we don't have to do all of them if we hit our target with just a few of the optimizations. I listed them in order from what seems easiest to what seems hardest.

I can be more precise in this section, although I don't want to overly prescribe how this could be implemented.

right. the 'puts take <3 seconds' seems like a 'how do we know we're done', rather than a 'plan for work'

Good news with some lessons learned from libp2p/go-libp2p-kad-dht#709 it turns out that we have a prototype that seems to do the job and already hits under 3s.

The big wins were:

Having large routing tables we intermittently refresh means lookups take 0 network hops

By changing the number of peers we wait on from 20 to a more flexible function, like wait for 30% of 20 responses then wait for a few hundred ms of no new responses, we dealt with the long tail slowness issues

petar · 2021-02-17T19:56:52Z

proposals/ipfs-content-providing.md

+- The work is useful even though a more comprehensive solution will eventually be put forward, meaning either:
+    - Users are not willing to wait, or ecosystem growth is throttled, until we build a more comprehensive content routing solution
+    - The changes made here are either useful independent of major content routing changes, or the changes are able to inform or build towards a more comprehensive routing solution
+


I think these projects are also useful for some byproducts they will have (worth counting):

They will probably entail designing/implementing extensible provider records (needed for payment systems, etc.)

They will probably entail upgrading the blockstore to a ref-counted, timestamped partial dag store; which is integral going forward for (i) any content routing caching algorithm, (ii) garbage collection.

This would be nice, but I'm shrinking the scope here so we don't necessarily have to tackle these together

petar · 2021-02-17T19:59:21Z

proposals/ipfs-content-providing.md

+Probably the most visible primitive in the web3 dev stack is content addressing which allows someone to retrieve data via its CID no matter who has it. However, while content addressing allows a user to retrieve data from **anyone** it is still critical that there are systems in place that allow a user to find **someone** who has the data (i.e. content routing).
+
+Executing well here would make it easier for users to utilize the IPFS public DHT, the mostly widely visible content routing solution in the IPFS space. This would dramatically improve usability and the onboarding experience for both new users and the experience of existing users, likely leading to ecosystem growth.
+


It would presumably also meet a specific ask from Pinata.

petar · 2021-02-17T20:00:38Z

proposals/ipfs-content-providing.md

+-->
+
+Many of the components of this proposal increase development velocity by either exposing more precise tooling for debugging or working with users, or by directly enabling future work.
+


These project will also likely further decouple content routing (and the complex caching algorithms it utilizes) from specific applications like bitswap and graphsync.

Thus enabling higher app developer velocity.

This might be true, but isn't necessarily the case in the MVP here.

proposals/ipfs-content-providing.md

lidel · 2021-02-18T15:34:51Z

proposals/ipfs-content-providing.md

+_How would a developer or user use this new capability?_
+<!--(short paragraph)-->
+
+Users who use go-ipfs would be able to tell what percentage of their provider records have made it out to the network in a given interval and would notice more of their content being discoverable via the IPFS public DHT. Additionally, users would have a number of configurable options available to them to both modify the throughput of their provider record advertisements and to advertise fewer provider records (e.g. only advertising pin roots)


I remember discussing this one time. Would be a huge improvement for most of real-world uses (package manages, wikipedia snapshots)

Suggested change

Users who use go-ipfs would be able to tell what percentage of their provider records have made it out to the network in a given interval and would notice more of their content being discoverable via the IPFS public DHT. Additionally, users would have a number of configurable options available to them to both modify the throughput of their provider record advertisements and to advertise fewer provider records (e.g. only advertising pin roots)

Users who use go-ipfs would be able to tell what percentage of their provider records have made it out to the network in a given interval and would notice more of their content being discoverable via the IPFS public DHT. Additionally, users would have a number of configurable options available to them to both modify the throughput of their provider record advertisements and to advertise fewer provider records (e.g. only advertising pin roots, or only the root of each file is unixfs)

I'd like to add this in too, but it might be out of scope for this project. It's an extra feature which, while valuable, might not be as high value as the other ones here.

proposals/ipfs-content-providing.md

lidel · 2021-02-18T15:42:16Z

proposals/ipfs-content-providing.md

+These alternatives are not exclusive with the proposal
+
+1. Focus on decreasing the number of provider records
+    - e.g. Add more options for data reproviding such as for UnixFS files only advertising Files and Directories


💯 we should add this as a new Reprovider.Strategy (thinking.. pinned+files-roots)

Agreed, that would be nice. Maybe only announce a file if the node has the whole file in cache?

Maybe worth a discussion if that should be the default for example for browser integrations (like brave) and ipfs-desktop. If someone just wants to share some files, I don't see a reason to announce all chunks. Hunting for nodes which have just some single blocks of a file because of deduplication is probably not worth the effort of connecting to them.

BigLep

Thanks for putting this together - good stuff!

proposals/ipfs-content-providing.md

BigLep · 2021-02-19T08:07:40Z

proposals/ipfs-content-providing.md

+_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_.
+
+<!--Explain why this rating-->
+2 . We don't have direct market research demonstrating improving the resiliency of content routing will definitely lead to more people choosing IPFS or to work with the stack. However, this is a pain point for many of our users (as noted on the IPFS Matrix, Discuss and GitHub) and something we have encountered as an issue experienced by various major ecosystem members (Protocol Labs infra, Pinata, Infura, etc.).


Do we have more data on:

How this pain point has impacted them (e.g., has it prevented certain use cases)?

How have they worked around it?

What kind of performance they're expecting?

It's been a problem for some use cases like package management (e.g. ipfs and pacman ipfs/notes#84, IPFS and Gentoo Portage (distfiles) ipfs/notes#296), and pinning services have had difficulty as well.

Applications can sort of get around this by advertising application names (e.g. myApp ) instead of data CIDs. However, this falls apart as the number of application users gets larger. For certain use cases ipfs-cluster could come in handy as well. Pinning services have a few different approaches that are basically 1) build a custom reprovider that tries to be a bit faster (although mostly by throwing more resources + parallelism at the problem and not tweaking the underlying DHT client usage) 2) have really high connection limits so they're connected to tons of people, and permanently connect to major gateways.

I'm not sure, but mostly they just want data added to go-ipfs to just be made available for downloading without worrying about it and without it being crazy expensive to run

It's been a problem for some use cases like package management (e.g. [ipfs and pacman ipfs and pacman ipfs/notes#84]

If you have any questions on this, @BigLep feel free to ask :)

BigLep · 2021-02-19T08:11:29Z

proposals/ipfs-content-providing.md

+2 . We don't have direct market research demonstrating improving the resiliency of content routing will definitely lead to more people choosing IPFS or to work with the stack. However, this is a pain point for many of our users (as noted on the IPFS Matrix, Discuss and GitHub) and something we have encountered as an issue experienced by various major ecosystem members (Protocol Labs infra, Pinata, Infura, etc.).
+
+## Project definition
+#### Brief plan of attack


Are there any new test scenarios that we'd need to develop? For example, as part of CI, should we have a test that asserts X advertisements can be made within Y seconds?

It'd be nice to do in CI especially if those tests are publicly viewable. However, it wouldn't be so bad to just check in on our metrics since they report performance on go-ipfs master + the latest release and it already metrics it already has on provide speed. However, if we want to test some of the massive providing strategies (e.g. huge routing tables + many provides) we'll likely need some more testing

Got it. I don't know the landscape to have more input. A couple of more thoughts:

If there is fear of regression here, then having a test that can catch that seems reasonable.

If we are going to advertise that customers with massive providing strategies will see improved performance, I think we'll want to verify this in some way and should include that in the work plan.

Co-authored-by: Marcin Rataj <[email protected]>

petar · 2021-02-19T16:54:40Z

proposals/ipfs-content-providing.md

+#### What does done look like?
+_What specific deliverables should completed to consider this project done?_
+
+The project is done when users can see how much of their provide queue is complete, are able to allocate resources to increase their provide throughput until satisfied, and allocating resources is either not prohibitively expensive, or it is deemed too much work to decrease the resource allocation.


Thumbs up for "continuous transparency": seeing the state of providing at all times.

BigLep · 2021-02-23T00:10:40Z

proposals/ipfs-content-providing.md

+_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_
+
+- People have other issues that the DHT put performance is just masking, which means we will not immediately be able to see the impact from this project alone
+- Users will not want to spend the raw bandwidth of emitting their records even if lookups are instant


n00b question: Do any customers complain about bandwidth today?

Not that I've heard of (although @Stebalien might have more info), but providing is pretty heavily limited so the DHT provide bandwidth is unlikely to be a problem today.

The question is around what happens next, i.e. once putting data in the DHT is fast there will still be users who aren't really able to use it.

Some back of the envelope math here is:

A user with 100M provider records where each record is 100 bytes (this is a large overestimate, it's more like 40, but we may want to add some more data to the records) who puts each record to 20 nodes every 24hrs uses 200GiB/day of upload bandwidth. AWS egress prices are around $0.09/GB, so around $20/month.

Again this is an overestimate and might be dwarfed by the egress costs of serving the actual data or other associated costs, but it's not 0.

https://archive.org/ has 538B webpages. If every one of those webpages (the vast majority of which I assume are not normally accessed) was to be individually addressed and advertised in the DHT daily it would be quite expensive.

Thanks for the explanation and back-of-envelope math; makes sense. Given this info, I'm assuming most (something like 99%?) of customers won't care. I assume huge dataset customers have other special requirements/needs/setup that we'll have other work to make their journey delightful anyways. Given the desire to make IPFS an exceptional tool for developers, the bandwidth increase seems acceptable to take given the benefit.

BigLep · 2021-02-23T00:16:36Z

proposals/ipfs-content-providing.md

+2 . We don't have direct market research demonstrating improving the resiliency of content routing will definitely lead to more people choosing IPFS or to work with the stack. However, this is a pain point for many of our users (as noted on the IPFS Matrix, Discuss and GitHub) and something we have encountered as an issue experienced by various major ecosystem members (Protocol Labs infra, Pinata, Infura, etc.).
+
+## Project definition
+#### Brief plan of attack


Got it. I don't know the landscape to have more input. A couple of more thoughts:

If there is fear of regression here, then having a test that can catch that seems reasonable.

If we are going to advertise that customers with massive providing strategies will see improved performance, I think we'll want to verify this in some way and should include that in the work plan.

aschmahmann · 2021-04-05T20:00:24Z

proposals/ipfs-content-providing.md

+     - Not requiring responses from all 20 peers before returning to the user
+     - Not requiring responses from the 3 closest peers before aborting the query (e.g. perhaps 5 of the closest 10)
+- Add a function to the DHT for batch providing (and putting) and utilize it in go-ipfs
+   - Tests with https://github.com/libp2p/go-libp2p-kad-dht/pull/709 showed tremendous speedups even in a single threaded provide loop if the provider records were sorted in XOR space


With a very small number of failures we were able to reach around 3 puts per second for 1k puts and 20 provides per second for 60k puts. The provides per second should increase the more we do at a time. This is as opposed to 1 provide per 30 seconds.

aschmahmann · 2021-04-05T21:05:06Z

proposals/ipfs-content-providing.md

+- Enable downloading sub-DAGs when a user already has the root node, but is only advertising the root node
+    - e.g. have Bitswap sessions know about the graph structure and walk up the graph to find providers when low on peers
+- Add a new command to `go-ipfs` (e.g. `ipfs provide`) that at minimum allows users to see how many of their total provider records have been published (or failed) in the last 24 hours)
+- Add an option to go-libp2p-kad-dht for very large routing tables that are stored on disk and are periodically updated by scanning the network


PR for this libp2p/go-libp2p-kad-dht#709

BigLep · 2021-06-01T18:53:18Z

The functionality here is happening as an experimental feature in go-ipfs 0.9 (see ipfs/kubo#8058 )

aschmahmann · 2021-06-01T19:00:58Z

@BigLep most of it is, however "Enable downloading sub-DAGs when a user already has the root node, but is only advertising the root node" is not done yet.

If you wanted to we could reasonably close this issue and open a new one aimed at decreasing the number of provider records that need to be advertised in the system.

proposals: add ipfs-content-providing.md

c975ca8

willscott reviewed Feb 17, 2021

View reviewed changes

petar reviewed Feb 17, 2021

View reviewed changes

lidel reviewed Feb 18, 2021

View reviewed changes

Update ipfs-content-providing.md

7d4bb51

BigLep reviewed Feb 19, 2021

View reviewed changes

Apply suggestions from code review

0954c24

Co-authored-by: Marcin Rataj <[email protected]>

petar reviewed Feb 19, 2021

View reviewed changes

BigLep reviewed Feb 23, 2021

View reviewed changes

BigLep added the Steward Priority Stewards priority project due to enabling us to move faster and/or safer. label Apr 5, 2021

aschmahmann marked this pull request as ready for review April 5, 2021 19:20

add requirement for batch providing and putting

65518e9

aschmahmann commented Apr 5, 2021

View reviewed changes

BigLep assigned BigLep and aschmahmann May 26, 2021

Wondertan mentioned this pull request May 31, 2021

Investigate initial timeouts on DHT / DAS celestiaorg/celestia-core#377

Closed

5 tasks

jacobheun merged commit b712ea4 into main Jun 23, 2021

jacobheun deleted the proposal/ipfs-content-providing branch June 23, 2021 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: IPFS Content Providing #31

Proposal: IPFS Content Providing #31

aschmahmann commented Feb 17, 2021

willscott Feb 17, 2021

aschmahmann Feb 17, 2021

willscott Feb 17, 2021

aschmahmann Apr 5, 2021

petar Feb 17, 2021

aschmahmann Apr 5, 2021

petar Feb 17, 2021

petar Feb 17, 2021

petar Feb 17, 2021

aschmahmann Apr 5, 2021

lidel Feb 18, 2021

aschmahmann Feb 19, 2021

lidel Feb 18, 2021

RubenKelevra Jul 27, 2021

BigLep left a comment

BigLep Feb 19, 2021

aschmahmann Feb 19, 2021

BigLep Feb 23, 2021

RubenKelevra Jul 27, 2021

BigLep Feb 19, 2021

aschmahmann Feb 19, 2021

BigLep Feb 23, 2021

petar Feb 19, 2021

BigLep Feb 23, 2021

aschmahmann Feb 23, 2021

BigLep Feb 24, 2021

BigLep Feb 23, 2021

aschmahmann Apr 5, 2021

aschmahmann Apr 5, 2021

BigLep commented Jun 1, 2021

aschmahmann commented Jun 1, 2021 •

edited

Loading

		Probably the most visible primitive in the web3 dev stack is content addressing which allows someone to retrieve data via its CID no matter who has it. However, while content addressing allows a user to retrieve data from anyone it is still critical that there are systems in place that allow a user to find someone who has the data (i.e. content routing).

		Executing well here would make it easier for users to utilize the IPFS public DHT, the mostly widely visible content routing solution in the IPFS space. This would dramatically improve usability and the onboarding experience for both new users and the experience of existing users, likely leading to ecosystem growth.

		-->

		Many of the components of this proposal increase development velocity by either exposing more precise tooling for debugging or working with users, or by directly enabling future work.

Proposal: IPFS Content Providing #31

Proposal: IPFS Content Providing #31

Conversation

aschmahmann commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigLep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigLep commented Jun 1, 2021

aschmahmann commented Jun 1, 2021 • edited Loading

aschmahmann commented Jun 1, 2021 •

edited

Loading