Migrations along with first batch of data should be performed in a single transaction #92

mrsarm · 2021-05-20T22:27:11Z

Each script and each batch of data is synchronized in a different transaction, so if we introduce a bug like the unique index added recently into the telemetry view, the process is aborted while synchronizing the first batch of data but the changes are not rolled back, so in that case the bugs were actually 2: the index definition was wrong and the process should roll back the last migrations and set of data.

How it should work

A connection and a transaction with PG is created and shared across all the "postgrator" objects that execute the migrations
Then the same transaction object is passed to couch2pg object to perform the first batch of synchronization.
If the are an error either in the step 1 or 2 the transaction is aborted. If not, the transaction is committed.
At some point medic-couch2pg will try to synchronize data again, a new transaction need to be created, because we cannot persist infinitely the transaction created before. Also the transaction should be shared among all the couch2pg objects that sync data, because we are synchronizing the same PG database against 3 different Couch databases (medic, medic-sentinel and medic-users-meta), and each synchronization is performed by different couch2pg objects.

Considerations

This may slow down the sync process, we should perform some stress tests. By the other hand it shouldn't be a big problem if sync is performed in a time where the data is little used.
May also require more disc space in the PG instances, specially when materialized views with big chunk of data are recreated.
Not 100% sure that materialized views are kept in transactions, at a first glance looks like they are supported within transactions.

CC @garethbowen @kennsippell

mrsarm · 2021-05-20T22:35:53Z

Also another important point to move to transactions: if we implement it, and we do not perform any more CASCADE migrations, and instead we DROP first the elements one by one that we control in the migrations, we will avoid the problem of cascading undesired drops, eg. a partner add a view that depends of one view maintained here, the process will fail because the partner view depends on the dropped view, the transaction will be reverted back and no undesired deletions will happen.

However this won't avoid other undesired consequences, eg. the partner edited one view controlled here and when medic-couch2pg recreates it with new changes, nothing will prevent the change to be lost, that kind of issues need to be addressed handling outside changes in an accountable manner.

mrsarm added the bug label May 20, 2021

This was referenced May 21, 2021

Accountable changes: Allow migration sets outside this repo #93

Open

Forbide the use of CASCADE instructions in SQL using linter #94

Open

85 - 86 - 87 - Fix medic-users-meta views and make compatible with daily telemetry #91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrations along with first batch of data should be performed in a single transaction #92

Migrations along with first batch of data should be performed in a single transaction #92

mrsarm commented May 20, 2021

mrsarm commented May 20, 2021 •

edited

Loading

Migrations along with first batch of data should be performed in a single transaction #92

Migrations along with first batch of data should be performed in a single transaction #92

Comments

mrsarm commented May 20, 2021

How it should work

Considerations

mrsarm commented May 20, 2021 • edited Loading

mrsarm commented May 20, 2021 •

edited

Loading