Proposal for a 'per-request prefetch' #62

willscott · 2023-03-06T11:37:39Z

This allows a sub-blockstore interface to be used at a request level specifying an initial expectation of the path -> blocks needed.

I suspect that this can initially be the inbound path to the gateway with '?depth=1' added in unixfs handler and '?depth=all' for tar/car handlers.

That will trigger loading of the requested car into a local cache in front of the blockstore, and blocks reading of the stream from the backing request to be reasonably in line with reads.

aarshkshah1992 · 2023-03-14T08:57:24Z

stoker/stoker.go

+
+		n := 0
+		for {
+			ss.writeAheadSynchronizer.ReqAdd()


IIUC, this is to limit reading of incoming blocks from the CAR stream to keep memory usage in check, right ?
Given that the cache size is bounded, I don't think we need this to limit how many blocks we put in the cache.

Basically, what would happen if we didn't use a writeAheadSynchronizer here ?

aarshkshah1992 · 2023-03-14T08:58:56Z

stoker/stoker.go

+	if err != nil {
+		return nil, err
+	}
+	ss.cache.Add(it, blk)


Why do we NOT need to increment the count on the writeAheadSynchronizer here ?

aarshkshah1992 · 2023-03-14T08:59:41Z

stoker/stoker.go

+
+const SessionCacheSize = 1024
+
+func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore {


Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

@lidel Is it easy for Bifrost to pass the path here that it gets from it's clients ? Will the "affinity" key be blank in this case ? Becasue from the code, I can see that we only pass the path to Lassie when the affinity key is empty.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Also @lidel -> Would it make sense to share the block cache across requests given IPFS GW access patterns ?

Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.

When done, bifrost-gateway may use Get* functions from IPFSBackend interface for talking with caboose instead getBlock for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.

Is it easy for Bifrost to pass the path here that it gets from it's clients ?

Caboose already reads and uses the requested Content Path as the Affinity key here.

This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host HTTP header too).

The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/ is requested, is to do all CAR requests with ?depth=1, and effectively fetch level-by-level instead of block-by-block.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).

If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.

Would it make sense to share the block cache across requests given IPFS GW access patterns ?

We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.

If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)

aarshkshah1992 · 2023-03-14T09:02:01Z

stoker/stoker.go

+		// from pre-fetching. that said, this will more likely read out the
+		// fill pre-fetch stream, which is still a reasonable choice.
+		ss.writeAheadSynchronizer.Dec()
+


What does calling Dec here signify ? We're not really removing anything from the cache here. Is it like a "yeah, your prefetching is useful to me, keep doing what you are doing". Are we optimizing for memory usage here ?

aarshkshah1992 · 2023-03-14T09:05:36Z

stoker/loosesync.go

+// decrement 1 from the queue, potentially unblocking writers
+func (w *writeAheadSynchronizer) Dec() {
+	curr := w.current.Add(-1)
+	if curr < 0 {


This can go below zero because the same cached block can get read multiple times by Bifrost, right ? Do we want to decrement the count in this case where the same cached block gets read multiple times ?

lidel · 2023-03-15T00:49:32Z

stoker/stoker.go

+
+type stokerSession struct {
+	c     *caboose.Caboose
+	cache *lru.ARCCache


nit: avoid using ARC, it comes with uncertainty related to patent by IBM and someone will ask for removal: ipfs/kubo#6590

lidel · 2023-03-15T01:24:54Z

stoker/stoker.go

+
+const SessionCacheSize = 1024
+
+func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore {


Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.

When done, bifrost-gateway may use Get* functions from IPFSBackend interface for talking with caboose instead getBlock for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.

Is it easy for Bifrost to pass the path here that it gets from it's clients ?

Caboose already reads and uses the requested Content Path as the Affinity key here.

This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host HTTP header too).

The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/ is requested, is to do all CAR requests with ?depth=1, and effectively fetch level-by-level instead of block-by-block.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).

If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.

Would it make sense to share the block cache across requests given IPFS GW access patterns ?

We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.

If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)

lidel · 2023-04-05T17:49:36Z

I believe we can close this since we now have GRAPH_BACKEND from ipfs-inactive/bifrost-gateway#61.
Follow up work is tracked in ipfs-inactive/bifrost-gateway#62 and ipfs-inactive/bifrost-gateway#47

add a basic request cache stoker

72d21f7

willscott requested a review from lidel March 6, 2023 11:37

willscott changed the base branch from main to nonblock March 6, 2023 11:37

Base automatically changed from nonblock to main March 6, 2023 15:33

aarshkshah1992 reviewed Mar 14, 2023

View reviewed changes

lidel reviewed Mar 15, 2023

View reviewed changes

lidel closed this Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for a 'per-request prefetch' #62

Proposal for a 'per-request prefetch' #62

willscott commented Mar 6, 2023

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023 •

edited

Loading

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023

lidel Mar 15, 2023 •

edited

Loading

aarshkshah1992 Mar 14, 2023

aarshkshah1992 Mar 14, 2023

lidel Mar 15, 2023

lidel Mar 15, 2023 •

edited

Loading

lidel commented Apr 5, 2023


		const SessionCacheSize = 1024

		func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore {

Proposal for a 'per-request prefetch' #62

Proposal for a 'per-request prefetch' #62

Conversation

willscott commented Mar 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarshkshah1992 Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

lidel commented Apr 5, 2023

aarshkshah1992 Mar 14, 2023 •

edited

Loading

lidel Mar 15, 2023 •

edited

Loading

lidel Mar 15, 2023 •

edited

Loading