This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
generated from ipfs/ipfs-repository-template
-
Notifications
You must be signed in to change notification settings - Fork 20
Investigate why success rate gets worse over time #92
Comments
@lidel The new Caboose wasn't there when we reported this issue two weeks back. So I'd be more suspicious of something going wrong with |
I've restarted staging with |
Closed
Related PR: #102 |
github-project-automation
bot
moved this from 🏗 In progress
to ✅ Done
in bifrost-gateway
Apr 26, 2023
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Restarting
biforst-gateway
on staging produces very close success rate, but over time, it erodes into worse and worse state:Inspect yourself:
Summary from the latter:
Some ideas/thoughts why:
in-memory block cache perf regression is unlikely, cache size is symbolic, aims to limit roundtrips per requests.
Staging runs with BLOCK_CACHE_SIZE=16k (Adjust size of in-memory block cache #47 (comment)) and the slowness will happens way after that is filled up multiple times, and we see on the next graph the duration increase of CAR fetch happens on Caboose side:
Saturn L1 pool health gets worse for some reason:
Saturn per-L1 CAR fetch durations increase while other durations stay the same:
HTTP 499s suggests clients giving up before they get our response, which is aligned with things getting slower over time, and more and more clients giving up waiting for response. This is not specific to Rhea, the old mirrored node that runs Kubo is also seeing more 499s over time, but it is less prominent:
Any feedback / thoughts / hypothesis are welcome. 🙏
The text was updated successfully, but these errors were encountered: