Ability to persist hashed source data to reconcilation DB #6

chadlwilson · 2021-10-01T06:53:17Z

Context / Goal

A core feature of the design is to be able to take data from a source DB, load the rows and hash the data and then persist them to a reconciliation DB in hashed form, that can be later compared to target.

We can start by being able to stream data

Expected Outcome

Be able to execute an arbitrary dataset query, expressed as SQL, against a configured source DB
Hash the column values other than a designated MigrationKey using SHA256
- start off with handling for columns that can be mapped easily to core Java types (boolean, integer numerics, floating points, strings)
Be able to persist the (migrationKey, sourceHashValue) to a 3rd "reconciliation DB" owned by the reconciliation tool

Out of Scope

Loading data from the target DB (Ability to persist/merge hashed target data into reconcilation DB #18)
Producing a report of the results (Persist reconcilation report results in a queryable manner #12)
performance testing (Exploratory performance testing of load and persist #26)

Additional context / implementation notes

Assume using an initial approach based on Spike use of r2dbc to stream data off databases #20

The text was updated successfully, but these errors were encountered:

…on for datasets during start-up This possibly isn't the optimal/canonical way of doing this, but I had struggles getting the Micronaut `ConfigurationProperties` working with nested custom properties types. Also cannot find an elegant way to augment ConfigurationProperties with beans elsewhere in the context. This seems to work for now. Also - tried to make tests start a bit faster by avoiding so much work in the default `test` Micronaut environment (although needs more work!) - added mockito-kotlin to make it a bit easier to work with Mockito in Kotlin

…s name - Currently automatically starts a hard-coded stream of a dataset at start-up - Each row is just toStringed right now; obviously not a real implementation - Doesn't go anywhere; just logged

Initial attempt to do this goes through the columns in returned order and adds to hash each type as expected. - This may need to be adjusted later since different DB/driver implementations may return short vs int vs long and these may be added to the hasher differently, perhaps these minor differences need to be abstracted away? - currently haven't added support for date/times etc

… will eventually need to be validated

…B row-by-row Currently - no tracking of the overall reconciliation - schema design needs work to introduce a "migration run" rather than using the data set name - error handling needs work - will need some batching of inserts

We only need the interfaces at the moment; as the code is not using JPA itself right now; only simple Micronaut Data queries. Can evaluate whether we need full JPA a bit later.

…e rows together Allows running reconciliation for the same dataset multiple times

…te the source load

… on completion

…rings with toString May need to be re-evaluated later if this doesn't turn out to be convenient for real use cases.

Allows for more thorough testing than with a single row

Nulls of different implied java types will be considered unequal; however the raw database types can differ. This handling may need to be varied on dataset level later, however this seems a reasonable assumption to continue with for now.

…e service and set `completedTime` on runs Not sure if have this right, but trying to turn the `ReconciliationService` into having more responsibility for setting up the pipeline rather than tracking/managing lifecycle

chadlwilson added the enhancement New feature or request label Oct 1, 2021

chadlwilson mentioned this issue Oct 4, 2021

Ability to persist/merge hashed target data into reconcilation DB #18

Closed

chadlwilson self-assigned this Oct 5, 2021

chadlwilson added a commit that referenced this issue Oct 7, 2021

#6 Correct literals in sample query

94d2f7e

chadlwilson added a commit that referenced this issue Oct 7, 2021

#6 Move the declared column name string to the configuration where it…

4ec6a6e

… will eventually need to be validated

chadlwilson added a commit that referenced this issue Oct 7, 2021

#6 Migrate DataSetService to main source

b2e0189

chadlwilson added a commit that referenced this issue Oct 11, 2021

#6 Remove Hibernate JPA for now

cabffb2

We only need the interfaces at the moment; as the code is not using JPA itself right now; only simple Micronaut Data queries. Can evaluate whether we need full JPA a bit later.

chadlwilson added a commit that referenced this issue Oct 11, 2021

#6 Upgrade Micronaut to 3.1.0

bd792ae

chadlwilson added a commit that referenced this issue Oct 12, 2021

#6 Re-org dependencies to make their purposes a bit clearer

3521e8a

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 Persist a summary of each reconciliation run which can link all th…

9fe85de

…e rows together Allows running reconciliation for the same dataset multiple times

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 Log Micronaut Data queries during tests for ease of debugging

200508c

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 Persist a summary of each reconciliation run which can link all th…

8100e91

…e rows together Allows running reconciliation for the same dataset multiple times

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 Log Micronaut Data queries during tests for ease of debugging

4437831

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 Persist the migration run details in parallel to starting to execu…

e8dcaef

…te the source load

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 - Destructure tuples for readability

4dcbe2b

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 - Pool R2DBC connections and ensure source DB connection is closed…

fd0d05d

… on completion

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 - Improve handling of MigrationKey - coerce non-string types to St…

ffcf7f4

…rings with toString May need to be re-evaluated later if this doesn't turn out to be convenient for real use cases.

chadlwilson added a commit that referenced this issue Oct 13, 2021

#6 - Change dataset test to return multiple rows

01ac198

Allows for more thorough testing than with a single row

chadlwilson added a commit that referenced this issue Oct 14, 2021

#6 - Split & restructure code for maintainability

77b5410

chadlwilson added a commit that referenced this issue Oct 14, 2021

#6 - Remove unnecessary default impl for PostConstructable

7aaa5de

chadlwilson removed their assignment Oct 18, 2021

chadlwilson closed this as completed Oct 25, 2021

chadlwilson self-assigned this Nov 5, 2021

chadlwilson added the size:M medium items label Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to persist hashed source data to reconcilation DB #6

Ability to persist hashed source data to reconcilation DB #6

chadlwilson commented Oct 1, 2021 •

edited

Loading

Ability to persist hashed source data to reconcilation DB #6

Ability to persist hashed source data to reconcilation DB #6

Comments

chadlwilson commented Oct 1, 2021 • edited Loading

chadlwilson commented Oct 1, 2021 •

edited

Loading