-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest batches of ledgers in-memory before flushing to DB #5099
Labels
Comments
7 tasks
reduced scope slightly, removed - verifyRangeState from list of states that need ranged ledger enablement, due to that state invoked change processors also, which we want to retain those processors to having a single ledger scope, we want to limit ledger ranged scope to only the tx processors. |
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 15, 2023
… send batches of ledgers to processors
7 tasks
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 16, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 17, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 17, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 27, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 27, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 28, 2023
sreuland
added a commit
to sreuland/go
that referenced
this issue
Nov 28, 2023
…st max flush size if lower than default of 100
This was referenced Nov 28, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In #4909 we have updated all the transaction processors to use the FastBatchInsertBuilder to insert rows into the history tables with the postgres COPY command. We also refactored the transaction processor interface to allow a single processor to accumulate records across multiple ledgers:
Now we can build on this work to improve the performance of reingestion. #4909 improved the performance of ingesting a single ledger. But we can extract more performance gains by ingesting multiple ledgers within a single processor lifetime.
In the spike branch the dataflow for ingesting batches of ledgers within a single processor lifetime is:
We will need to implement this new dataflow on the following states in the ingestion state machine:
Note the resume state only ingests a single ledger so it is already covered by #4909 .
Also, the following bugs will need to be addressed when implementing this issue:
RebuildTradeAggregationBuckets() cannot be invoked concurrently during parallel ingestion because there will be duplicate key constraint errors when two workers invoke the function on adjacent buckets. That is because the buckets occur on minute boundaries and two adjacent ledger ranges will share the same trade aggregations bucket. We can fix this by modifying parallel reingestion so that the trade aggregation buckets are built once all the workers have completed their ingestion jobs. - services/horizon/ingest: RebuildTradeAggregationBuckets cannot be invoked concurrently during parallel ingestion #5127
We should remove the
force
flag because it is incompatible with the new data-flow of ingestion where we batch multiple ledgers in a single transaction. - services/horizon/ingest: remove the force flag on reingestion cmds #5128The text was updated successfully, but these errors were encountered: