Syncing stalls for a couple of hundred seconds #1225

shufps · 2023-04-27T11:36:09Z

We have the problem that syncing stalls now and then for a couple of hundred of seconds.

Example (~400s):

handle_ledger_update{milestone_index=4855283 created=7 consumed=5}: inx_chronicle::inx: close time.busy=13.8ms time.idle=374ms
handle_ledger_update{milestone_index=4855284 created=8 consumed=8}: inx_chronicle::inx: close time.busy=13.6ms time.idle=292ms
handle_ledger_update{milestone_index=4855285 created=4 consumed=5}: inx_chronicle::inx: close time.busy=15.7ms time.idle=149s
tower_http::trace::on_failure: response failed classification=Status code: 503 Service Unavailable latency=23239 ms
handle_ledger_update{milestone_index=4855286 created=4 consumed=3}: inx_chronicle::inx: close time.busy=13.1ms time.idle=273s
tower_http::trace::on_failure: response failed classification=Status code: 503 Service Unavailable latency=19715 ms
handle_ledger_update{milestone_index=4855287 created=7 consumed=5}: inx_chronicle::inx: close time.busy=12.2ms time.idle=34.8s
handle_ledger_update{milestone_index=4855288 created=7 consumed=6}: inx_chronicle::inx: close time.busy=16.9ms time.idle=433ms
handle_ledger_update{milestone_index=4855289 created=4 consumed=5}: inx_chronicle::inx: close time.busy=12.6ms time.idle=389ms

We think, this could be connected to the analytics endpoint /ledger/richest-addresses that seems to be quite heavy on the database:

(at the time the screenshot was taken, the chronicle-service was not available on the load-balancer and there were about
20 such queries running).

This is the view of the load-balancer (3.00 members is when all chronicles healthy):

CPU-Profile looks about this on the Primary Mongo:

We think, that changing the implementation from range-queries to incremental could solve the problem because it would lower the load significantly.

Naively thought it would be like this:

run the query once at startup
only update the top100 richest address in memory on each new milestone

Probably token-distribution analytics is similar (but I don't know).

Chronicle version

Chronicle rc.1

The text was updated successfully, but these errors were encountered:

shufps · 2023-06-02T04:55:32Z

I tried caching the heavy endpoints for an hour but it seems it's only a temporary measure:

The spikes are exactly 1h apart and situation seems to get slowly worse over time 🤔

DaughterOfMars · 2024-03-22T12:51:26Z

This is fixed in 2.0 using new collections to track the latest address analytics.

shufps added the bug Something isn't working label Apr 27, 2023

DaughterOfMars mentioned this issue Apr 27, 2023

[WIP] feat(analytics): add richest addresses measurements to analytics #1226

Draft

5 tasks

This was referenced Oct 6, 2023

added cache to rich list and token distribution endpoints #1286

Closed

added caching to rich list and token distribution endpo… #1287

Open

DaughterOfMars closed this as completed Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncing stalls for a couple of hundred seconds #1225

Syncing stalls for a couple of hundred seconds #1225

shufps commented Apr 27, 2023 •

edited

Loading

shufps commented Jun 2, 2023

DaughterOfMars commented Mar 22, 2024

Syncing stalls for a couple of hundred seconds #1225

Syncing stalls for a couple of hundred seconds #1225

Comments

shufps commented Apr 27, 2023 • edited Loading

Chronicle version

shufps commented Jun 2, 2023

DaughterOfMars commented Mar 22, 2024

shufps commented Apr 27, 2023 •

edited

Loading