Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/ingest: reap lookup tables without blocking ingestion #5405

Merged
merged 18 commits into from
Sep 6, 2024

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Jul 30, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Close #4870

This PR improves reaping of history lookup tables (e.g. history_accounts, history_claimable_balances) so that it can run safely in parallel with ingestion. Currently, reaping of history lookup tables is a blocking operation for ingestion so if the queries to reap history lookup tables take too long that can result in ingestion lag. With this PR, reaping of history lookup tables will be able to run concurrently to ingestion with minimal contention. Also, it is important to note that this PR does not add any performance degradation for either reingestion or live ingestion.

When reviewing this PR it would be helpful to read this design doc:

https://docs.google.com/document/d/1CGfBCS99MTEZDP4mMhV1o6Z5NE_Tlg7ENCcWTwzhlio/edit

Known limitations

After running a full vacuum on history_accounts, the reaping query sped up dramatically. Previously, the duration of reaping the history_accounts table peaked at ~1.9 seconds:

https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1722295775773&to=1722400061302&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531

Screenshot 2024-08-28 at 7 23 34 PM

After the vacuum, the average duration for reaping history_accounts is ~20 ms and the peak duration was ~400 ms:

https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1724782666959&to=1724869066959&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531

Screenshot 2024-08-28 at 7 18 34 PM

This means that the risk that reaping of history lookup tables taking so long that it introduces ingestion lag is a lot less of a concern.

Update:

After running reaping of history lookup tables on staging for 24 hours I have observed that the peak duration actually reaches 600 ms.

https://grafana.stellar-ops.com/d/x8xDSQQIk/stellar-horizon?orgId=1&from=1724866821793&to=1724953221793&var-environment=stg&var-cluster=pubnet&var-network=All&var-route=All&viewPanel=2531

Screenshot 2024-08-29 at 6 40 17 PM

@tamirms tamirms force-pushed the concurrent-reap branch 3 times, most recently from 37b3e5b to c671a7a Compare August 15, 2024 10:31
@tamirms tamirms marked this pull request as ready for review August 28, 2024 18:16
@tamirms tamirms requested a review from a team August 29, 2024 17:47
@sreuland
Copy link
Contributor

one edge case wanted to check on, if a user reingests an older range which goes further back than the retention period cutoff, and reaping for data and lookup tables has already completed for that retention period, will the next iteration of lookup reaper sense those and delete the qualified(orhpaned) lookup ids in that case? I ask b/c of the offsets for reapers that are stored in key-value, it seems like once those advance, the reaper won't inspect that older id range anymore?

@tamirms
Copy link
Contributor Author

tamirms commented Sep 3, 2024

@sreuland

will the next iteration of lookup reaper sense those and delete the qualified(orhpaned) lookup ids in that case? I ask b/c of the offsets for reapers that are stored in key-value, it seems like once those advance, the reaper won't inspect that older id range anymore?

no, in that scenario those rows will not be deleted in the next iteration. However, eventually the reaper will traverse through all rows from the history lookup tables. Once it does that, the reaper will start from 0 again. So, eventually the reaper will wrap around and pickup those orphaned rows (though it might take a long time to do so for very large tables like history_claimable_balances)

@tamirms
Copy link
Contributor Author

tamirms commented Sep 3, 2024

@sreuland I believe I have addressed your feedback. PTAL, thanks!

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, nice work!

@tamirms tamirms merged commit eb4b2ab into stellar:master Sep 6, 2024
23 checks passed
@tamirms tamirms deleted the concurrent-reap branch September 6, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Epic] Improving Reap Performance of History Lookup Tables
2 participants