Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to persist hashed source data to reconcilation DB #6

Closed
chadlwilson opened this issue Oct 1, 2021 · 0 comments
Closed

Ability to persist hashed source data to reconcilation DB #6

chadlwilson opened this issue Oct 1, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request size:M medium items

Comments

@chadlwilson
Copy link
Collaborator

chadlwilson commented Oct 1, 2021

Context / Goal

A core feature of the design is to be able to take data from a source DB, load the rows and hash the data and then persist them to a reconciliation DB in hashed form, that can be later compared to target.

We can start by being able to stream data

Expected Outcome

  • Be able to execute an arbitrary dataset query, expressed as SQL, against a configured source DB
  • Hash the column values other than a designated MigrationKey using SHA256
    • start off with handling for columns that can be mapped easily to core Java types (boolean, integer numerics, floating points, strings)
  • Be able to persist the (migrationKey, sourceHashValue) to a 3rd "reconciliation DB" owned by the reconciliation tool

Out of Scope

Additional context / implementation notes

@chadlwilson chadlwilson added the enhancement New feature or request label Oct 1, 2021
@chadlwilson chadlwilson self-assigned this Oct 5, 2021
chadlwilson added a commit that referenced this issue Oct 6, 2021
…on for datasets during start-up

This possibly isn't the optimal/canonical way of doing this, but I had struggles getting the Micronaut `ConfigurationProperties` working with nested custom properties types. Also cannot find an elegant way to augment ConfigurationProperties with beans elsewhere in the context. This seems to work for now.

Also
- tried to make tests start a bit faster by avoiding so much work in the default `test` Micronaut environment (although needs more work!)
- added mockito-kotlin to make it a bit easier to work with Mockito in Kotlin
chadlwilson added a commit that referenced this issue Oct 7, 2021
chadlwilson added a commit that referenced this issue Oct 7, 2021
…s name

- Currently automatically starts a hard-coded stream of a dataset at start-up
- Each row is just toStringed right now; obviously not a real implementation
- Doesn't go anywhere; just logged
chadlwilson added a commit that referenced this issue Oct 7, 2021
Initial attempt to do this goes through the columns in returned order and adds to hash each type as expected.
- This may need to be adjusted later since different DB/driver implementations may return short vs int vs long and these may be added to the hasher differently, perhaps these minor differences need to be abstracted away?
- currently haven't added support for date/times etc
chadlwilson added a commit that referenced this issue Oct 7, 2021
chadlwilson added a commit that referenced this issue Oct 7, 2021
chadlwilson added a commit that referenced this issue Oct 11, 2021
…B row-by-row

Currently
- no tracking of the overall reconciliation
- schema design needs work to introduce a "migration run" rather than using the data set name
- error handling needs work
- will need some batching of inserts
chadlwilson added a commit that referenced this issue Oct 11, 2021
We only need the interfaces at the moment; as the code is not using JPA itself right now; only simple Micronaut Data queries. Can evaluate whether we need full JPA a bit later.
chadlwilson added a commit that referenced this issue Oct 11, 2021
chadlwilson added a commit that referenced this issue Oct 13, 2021
…e rows together

Allows running reconciliation for the same dataset multiple times
chadlwilson added a commit that referenced this issue Oct 13, 2021
…e rows together

Allows running reconciliation for the same dataset multiple times
chadlwilson added a commit that referenced this issue Oct 13, 2021
chadlwilson added a commit that referenced this issue Oct 13, 2021
…rings with toString

May need to be re-evaluated later if this doesn't turn out to be convenient for real use cases.
chadlwilson added a commit that referenced this issue Oct 13, 2021
Allows for more thorough testing than with a single row
chadlwilson added a commit that referenced this issue Oct 14, 2021
Nulls of different implied java types will be considered unequal; however the raw database types can differ.
This handling may need to be varied on dataset level later, however this seems a reasonable assumption to continue with for now.
chadlwilson added a commit that referenced this issue Oct 15, 2021
…e service and set `completedTime` on runs

Not sure if have this right, but trying to turn the `ReconciliationService` into having more responsibility for setting up the pipeline rather than tracking/managing lifecycle
@chadlwilson chadlwilson removed their assignment Oct 18, 2021
@chadlwilson chadlwilson self-assigned this Nov 5, 2021
@chadlwilson chadlwilson added the size:M medium items label Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:M medium items
Projects
None yet
Development

No branches or pull requests

1 participant