You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.
Have you considered any optimizations in the use of FanOut in GetRateLimits? (eg: not fanning out for local cache hits, ...)
I've been trying to use a gub cluster taking in ~40-80k QPS, each with ~5 items in the requests list and I've been reaching a ceiling (image below).
I tried a number of things: load balancing with envoy, various cluster sizes (from 1 to 5 machines of 16 cores each), etc.. However I wasn't able to saturate those machines, so I went hunting for blocking points. I initially thought it might be the global mutex on the cache and tried a sync.Map alternative but to no result.
I've taken some blocking profiles and there's quite some time spent in FanOut/ChanRecv (even locally, since it's expected on remote).
As a quick wip dirty hack, I eliminated the FanOut for local cache hits (and disabled remote). I only tested this locally in a single instance (as it made most sense given that I stripped out all GetPeerRatelimit for the quick proof of concept). I was able to go from 25k QPS to 40k QPS which indicated that I should be trying out a complete fanOut optimization.
Not sure if there's light at the end of this tunnel, but that almost 2x increase in QPS on the local machine definitely caught my attention.
Thank you for doing this analysis! (I kept seeing FanOut show up in my CPU profiles, but never followed up on it). Avoiding fanout for local cache hits is a great optimization! My current optimization research is looking into how we can use GLOBAL behavior to avoid the network requests to owning peers. But it's stalled because work priorities are not leaving me with free time to work on this. If you are interested, a PR with this optimization would be most welcome!
Purpose
Behavior=GLOBAL
. Reference our implementation with that of https://ipfs.io.TODO
The text was updated successfully, but these errors were encountered: