services/horizon/internal/db2/history: Implement account loader and future account ids #5015

tamirms · 2023-08-11T17:15:21Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

Some of the history tables have columns referencing accounts. However, these columns do not include the text representation of an account address. Instead, they include an integer id which points to the account address in the history_accounts table. When we ingest into history tables that have account integer id columns we first have to lookup the integer id from the history_accounts table and then we can construct and insert the row into the history table.

This PR introduces the concept of an account loader which encapsulates the management of account integer ids. Using the account loader we can more easily refactor the ingestion data flow. Whenever the history processor encounters an account string in ProcessTransaction(), the processor can register the account string in the loader component and store the resulting FutureAccountID in a FastBatchInsertBuilder. We can ensure that Exec() is called on the account loader before it's called on the FastBatchInsertBuilder.

	accountLoader := history.NewAccountLoader()
	processors := buildTransactionProcessors(
		s.historyQ,
		accountLoader,
	)

       // apply all the ledgers in the batch on the processors
       for _, ledger := range ledgers {
		if err = s.runner.ApplyProcessorsOnLedger(processors, ledgerCloseMeta); err != nil {
			return err
		}
       }
	if err := s.historyQ.Begin(); err != nil {
		return errors.Wrap(err, "Error starting a transaction")
	}
	defer s.historyQ.Rollback()

       // use the loader to lookup all the accounts registered by the processors
        if err := accountLoader.Exec(s.ctx, s.historyQ); err != nil {
	        return err
        }

        // flush the rows to the db, the processors will be able to obtain the integer ids from the loaders
	if err := processors.Flush(s.ctx, s.historyQ); err != nil {
		return err
	}
	if err := s.historyQ.Commit(); err != nil {
		return errors.Wrap(err, commitErrMsg)
	}

Why

This PR is required to implement #4909

Known limitations

[N/A]

sreuland · 2023-08-11T18:00:39Z

services/horizon/internal/db2/history/account_loader.go

+	for address := range a.set {
+		addresses = append(addresses, address)
+	}
+	// sort entries before inserting rows to prevent deadlocks on acquiring a ShareLock


is there a prior issue that explains this aspect further, since ingestion is single threaded and all this happens in same db tx, trying to understand where it could happen?

ingestion is single threaded. however, it is possible to do parallel reingestion with multiple workers where each worker has a separate but concurrent transaction

ok, then there is potential that same accounts could be processed from different ledger ranges of different worker threads at same time, how does sorting in that case further avoid db deadlock, I'm not suggesting to remove it, just to understand. It sticks out in the application code as performing unrelated complexity that would have expected to be mitigated at db level with tx repeatable read isolation?

let's say there are two workers ingesting different ledger ranges. In both the ledger ranges the workers have to insert the same set of accounts into the history_accounts table. If the workers insert the accounts in the same order they will avoid a deadlock because the worker who wins the race will acquire the lock and the other worker will block until the transaction is complete. Consider the worst case scenario if worker 1 inserts accounts A, B, C and worker 2 inserts accounts C, B, A. Let's say worker 1 is faster so it inserts accounts A and B. Then worker 2 inserts account C. When worker 1 tries to insert account C there will be a deadlock because worker 2 already has a lock on that row.

I don't think changing the transaction isolation level could avoid the deadlock

sreuland · 2023-08-11T18:07:32Z

services/horizon/internal/db2/history/account_loader.go

+// GetNow should only be called on values which were registered by
+// GetFuture() calls. Also, Exec() must be called before any GetNow
+// call can succeed.
+func (a AccountLoader) GetNow(address string) (int64, error) {


could follow the map interface and return bool for indication of presence rather than error?

If the loader is used correctly then the absence of an account is always an error. You should never lookup an address which you have never inserted.

I changed this so that GetNow() panics instead of returning an error

sreuland · 2023-08-11T18:15:40Z

services/horizon/internal/db2/history/account_loader.go

+
+	insert := 0
+	for _, address := range addresses {
+		if _, ok := a.ids[address]; ok {


~~can you apply this filter that skips 'set' addresses which are already resolved in ids up in the loop that builds the query on line 100, to avoid it going through db i/o?~~

nvm, I don't think there could be a case where set initially overlaps with any keys in ids at the start of Exec.

sreuland · 2023-08-11T18:30:48Z

services/horizon/internal/db2/history/account_loader.go

+	if insert == 0 {
+		return nil
+	}
+	addresses = addresses[:insert]


neat, re-used the same array in place

sreuland · 2023-08-11T19:01:06Z

services/horizon/internal/db2/history/account_loader.go

+		)
+	}
+
+	sql := `


would be compelling at some point if we have time to consider evaluating sql code options that provide compile time type safety on sql statements, less embedded string fragments/concat, like gojet. it may not even be viable for our setup, but worth keeping in mind.

sreuland · 2023-08-11T19:11:58Z

services/horizon/internal/db2/history/account_loader.go

+		return err
+	}
+
+	return a.lookupKeys(ctx, q, addresses)


should the AccountLoader.set be cleared at this point, since those 'future' requests have now been realized into AccountLoader.ids?

I intended that the account loader should only be used once and not be reused. I will add some code to enforce that intention.

sreuland · 2023-08-11T19:17:00Z

We can ensure that Exec() is called on the account loader before it's called on the FastBatchInsertBuilder.

it looks like in example code processors would have a ref to accountLoader, so the processors.Flush could invoke accountLoader.Exec at appropriate time internally, and caller would not have to orchestrate that ordering?

sreuland

great design!

tamirms · 2023-08-14T09:18:05Z

it looks like in example code processors would have a ref to accountLoader, so the processors.Flush could invoke accountLoader.Exec at appropriate time internally, and caller would not have to orchestrate that ordering?

We will actually want to orchestrate the ordering because in parallel reingestion we want the account loader to execute in a separate db transaction from the rest of ingestion. The reason behind that is because parallel workers are likely to have their db transactions block on insertions to the history_account tables (e.g. when multiple workers try to insert the same row). So if we isolate the call to accountLoader.Exec() to a separate transaction that will reduce the time where workers are blocked due to row locks on the history_account tables.

tamirms requested a review from a team August 11, 2023 17:15

Implement account loader and future account ids

6853ee0

tamirms force-pushed the account-loader branch from 65ee6e4 to 6853ee0 Compare August 11, 2023 17:23

sreuland reviewed Aug 11, 2023

View reviewed changes

sreuland approved these changes Aug 11, 2023

View reviewed changes

Make loader panic instead of return error

737d373

tamirms merged commit a630fcb into stellar:ingestion-next Aug 14, 2023
1 of 14 checks passed

tamirms deleted the account-loader branch August 14, 2023 14:57

tamirms mentioned this pull request Aug 16, 2023

services/horizon/internal/db2/history: Implement loaders for assets, claimable balances, and liquidity pools #5019

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services/horizon/internal/db2/history: Implement account loader and future account ids #5015

services/horizon/internal/db2/history: Implement account loader and future account ids #5015

tamirms commented Aug 11, 2023

sreuland Aug 11, 2023

tamirms Aug 11, 2023

sreuland Aug 11, 2023

tamirms Aug 14, 2023

tamirms Aug 14, 2023

sreuland Aug 11, 2023

tamirms Aug 14, 2023

tamirms Aug 14, 2023

sreuland Aug 11, 2023 •

edited

Loading

sreuland Aug 11, 2023

sreuland Aug 11, 2023

sreuland Aug 11, 2023

tamirms Aug 14, 2023

sreuland commented Aug 11, 2023

sreuland left a comment

tamirms commented Aug 14, 2023

services/horizon/internal/db2/history: Implement account loader and future account ids #5015

services/horizon/internal/db2/history: Implement account loader and future account ids #5015

Conversation

tamirms commented Aug 11, 2023

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreuland Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreuland commented Aug 11, 2023

sreuland left a comment

Choose a reason for hiding this comment

tamirms commented Aug 14, 2023

sreuland Aug 11, 2023 •

edited

Loading