-
Notifications
You must be signed in to change notification settings - Fork 2
Proposal for a 'per-request prefetch' #62
Conversation
|
||
n := 0 | ||
for { | ||
ss.writeAheadSynchronizer.ReqAdd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is to limit reading of incoming blocks from the CAR stream to keep memory usage in check, right ?
Given that the cache size is bounded, I don't think we need this to limit how many blocks we put in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, what would happen if we didn't use a writeAheadSynchronizer
here ?
if err != nil { | ||
return nil, err | ||
} | ||
ss.cache.Add(it, blk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we NOT need to increment the count on the writeAheadSynchronizer
here ?
|
||
const SessionCacheSize = 1024 | ||
|
||
func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will bifrost directly use the Stoker
blockstore instead of the Caboose
blockstore by giving it a path here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel Is it easy for Bifrost to pass the path
here that it gets from it's clients ? Will the "affinity" key be blank in this case ? Becasue from the code, I can see that we only pass the path
to Lassie when the affinity key
is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @lidel -> Would it make sense to share the block cache across requests given IPFS GW access patterns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?
My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.
When done, bifrost-gateway may use Get*
functions from IPFSBackend
interface for talking with caboose instead getBlock
for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.
Is it easy for Bifrost to pass the path here that it gets from it's clients ?
Caboose already reads and uses the requested Content Path as the Affinity key here.
This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host
HTTP header too).
The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/
is requested, is to do all CAR requests with ?depth=1
, and effectively fetch level-by-level instead of block-by-block.
When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?
Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).
If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.
Would it make sense to share the block cache across requests given IPFS GW access patterns ?
We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.
If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)
// from pre-fetching. that said, this will more likely read out the | ||
// fill pre-fetch stream, which is still a reasonable choice. | ||
ss.writeAheadSynchronizer.Dec() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does calling Dec
here signify ? We're not really removing anything from the cache here. Is it like a "yeah, your prefetching is useful to me, keep doing what you are doing". Are we optimizing for memory usage here ?
// decrement 1 from the queue, potentially unblocking writers | ||
func (w *writeAheadSynchronizer) Dec() { | ||
curr := w.current.Add(-1) | ||
if curr < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go below zero because the same cached block can get read multiple times by Bifrost, right ? Do we want to decrement the count in this case where the same cached block gets read multiple times ?
|
||
type stokerSession struct { | ||
c *caboose.Caboose | ||
cache *lru.ARCCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid using ARC, it comes with uncertainty related to patent by IBM and someone will ask for removal: ipfs/kubo#6590
|
||
const SessionCacheSize = 1024 | ||
|
||
func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?
My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.
When done, bifrost-gateway may use Get*
functions from IPFSBackend
interface for talking with caboose instead getBlock
for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.
Is it easy for Bifrost to pass the path here that it gets from it's clients ?
Caboose already reads and uses the requested Content Path as the Affinity key here.
This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host
HTTP header too).
The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/
is requested, is to do all CAR requests with ?depth=1
, and effectively fetch level-by-level instead of block-by-block.
When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?
Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).
If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.
Would it make sense to share the block cache across requests given IPFS GW access patterns ?
We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.
If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)
I believe we can close this since we now have GRAPH_BACKEND from ipfs-inactive/bifrost-gateway#61. |
This allows a sub-blockstore interface to be used at a request level specifying an initial expectation of the path -> blocks needed.
I suspect that this can initially be the inbound path to the gateway with '?depth=1' added in unixfs handler and '?depth=all' for tar/car handlers.
That will trigger loading of the requested car into a local cache in front of the blockstore, and blocks reading of the stream from the backing request to be reasonably in line with reads.