Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing stalls for a couple of hundred seconds #1225

Closed
shufps opened this issue Apr 27, 2023 · 2 comments · May be fixed by #1226
Closed

Syncing stalls for a couple of hundred seconds #1225

shufps opened this issue Apr 27, 2023 · 2 comments · May be fixed by #1226
Labels
bug Something isn't working

Comments

@shufps
Copy link

shufps commented Apr 27, 2023

We have the problem that syncing stalls now and then for a couple of hundred of seconds.

Example (~400s):

handle_ledger_update{milestone_index=4855283 created=7 consumed=5}: inx_chronicle::inx: close time.busy=13.8ms time.idle=374ms
handle_ledger_update{milestone_index=4855284 created=8 consumed=8}: inx_chronicle::inx: close time.busy=13.6ms time.idle=292ms
handle_ledger_update{milestone_index=4855285 created=4 consumed=5}: inx_chronicle::inx: close time.busy=15.7ms time.idle=149s
tower_http::trace::on_failure: response failed classification=Status code: 503 Service Unavailable latency=23239 ms
handle_ledger_update{milestone_index=4855286 created=4 consumed=3}: inx_chronicle::inx: close time.busy=13.1ms time.idle=273s
tower_http::trace::on_failure: response failed classification=Status code: 503 Service Unavailable latency=19715 ms
handle_ledger_update{milestone_index=4855287 created=7 consumed=5}: inx_chronicle::inx: close time.busy=12.2ms time.idle=34.8s
handle_ledger_update{milestone_index=4855288 created=7 consumed=6}: inx_chronicle::inx: close time.busy=16.9ms time.idle=433ms
handle_ledger_update{milestone_index=4855289 created=4 consumed=5}: inx_chronicle::inx: close time.busy=12.6ms time.idle=389ms

We think, this could be connected to the analytics endpoint /ledger/richest-addresses that seems to be quite heavy on the database:

image

(at the time the screenshot was taken, the chronicle-service was not available on the load-balancer and there were about
20 such queries running).

This is the view of the load-balancer (3.00 members is when all chronicles healthy):

image

CPU-Profile looks about this on the Primary Mongo:

image

We think, that changing the implementation from range-queries to incremental could solve the problem because it would lower the load significantly.

Naively thought it would be like this:

  • run the query once at startup
  • only update the top100 richest address in memory on each new milestone

Probably token-distribution analytics is similar (but I don't know).

Chronicle version

Chronicle rc.1

@shufps
Copy link
Author

shufps commented Jun 2, 2023

I tried caching the heavy endpoints for an hour but it seems it's only a temporary measure:

image

The spikes are exactly 1h apart and situation seems to get slowly worse over time 🤔

@DaughterOfMars
Copy link
Collaborator

This is fixed in 2.0 using new collections to track the latest address analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants