Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couch to SQL Repeat Records PR 1: Write to SQL #33978

Merged
merged 32 commits into from
Feb 22, 2024
Merged

Conversation

millerdev
Copy link
Contributor

@millerdev millerdev commented Jan 15, 2024

This is the first PR in the Couch to SQL migration process of Repeat Records. Commits for PR 2 (and eventually PR 3) can be found on dm/sql-repeatrecord.

🐠 Review by commit.

https://dimagi-dev.atlassian.net/browse/SAAS-14299

Safety Assurance

Safety story

Automated tests have been added where necessary to cover new functionality and to verify that Couch Repeat Records are synced to SQL as expected. In general, TDD was used to develop this PR. Repeater functionality should be unchanged.

Automated test coverage

Yes.

QA Plan

https://dimagi-dev.atlassian.net/browse/QA-6025

Rollback instructions

A decision will need to be made about what to do with data in SQLRepeatRecord and SQLRepeatRecordAttempt tables If this PR is reverted. Otherwise, this PR can be reverted after deploy with no further considerations.

cached_property is less complicated and easier to patch.
Syncing submodels using the Couch document model is expensive. It would be wildly inefficient to rewrite all attempts every time a repeat record is saved, not to mention when a new attempt is being added. However, proper submodel syncing is needed when copying old documents from Couch to SQL. New attempts are only added in non-migration code in one place, RepeatRecord.add_attempt(), which will be addressed in another commit.
@millerdev millerdev added awaiting QA QA in progress. Do not merge product/invisible Change has no end-user visible impact labels Jan 15, 2024
@millerdev millerdev requested a review from kaapstorm as a code owner January 15, 2024 21:52
@millerdev millerdev changed the title Couch to SQL Repeat Records: PR 1 Couch to SQL Repeat Records PR 1: Write to SQL Jan 15, 2024
@@ -50,7 +67,6 @@ def setUpClass(cls):
domain=cls.domain,
succeeded=True,
repeater_id=cls.repeater_id,
next_check=before,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this line removed/how was it breaking the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not valid to have a next_check date in SQL when state == Success.

models.CheckConstraint(
name="next_check_pending_or_null",
check=(
models.Q(next_check__isnull=True)
| models.Q(next_check__isnull=False, state=State.Pending)
| models.Q(next_check__isnull=False, state=State.Fail)
)
),

repeater=cls.repeater,
registered_at=now,
))
cls.sql_records = list(SQLRepeatRecord.objects.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can get rid of line 534 (cls.sql_records = [])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This gets refactored in a later commit (it will be in the next PR).

Copy link
Contributor

@AmitPhulera AmitPhulera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall. Just a minor comment on tests.

I am assuming this is already in process of QA and is deployed on staging?

self.succeeded = False
self.cancelled = False
try:
reason = self.failure_reason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will this raise AssertionError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonobject's JsonProperty raises AssertionError if the record does not have a failure_reason. That's annoying. It should raise AttributeError, which would allow standard use of getattr.

# NOTE a 204 status here could mean either
# 1. the request was not sent, in which case response is
# probably RepeaterResponse(204, "No Content")
# 2. the request was sent, and the remote end responded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is required to specify this in user facing docs as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly if we want to keep it this way. I think a better long-term fix would be to change the way we internally signal that the request was not sent so there is no ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (see docstring) we internally signaled that the request was not sent by passing True.

How about ...

if response is True or (
    isinstance(response, RepeaterResponse)
    and response.status_code == 204
):
    state = State.Empty
else:
    state = State.Success

... and in the future we can come up with a better, and consistent, internal signal?

Copy link
Contributor Author

@millerdev millerdev Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaapstorm can you point to the place(s) where response is True originates? I don't see anywhere that does that. Is it possible that they were all changed from return True to return RepeaterResponse(204, "No content")?

Edit: a test asserts that a (SQL) record created with response=True results in state == Success (not Empty). Should that test be changed?

self.repeat_record.add_success_attempt(response=True)
self.assertEqual(self.repeat_record.state, RECORD_SUCCESS_STATE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that they were all changed from return True to return RepeaterResponse(204, "No content")?

I checked. They were all changed:

Repeater.send_request() -> requests.Response
Dhis2Repeater.send_request() -> requests.Response | RepeaterResponse
FHIRRepeater.send_request() -> requests.Response | RepeaterResponse
ReferCaseRepeater.send_request() -> requests.Response | RepeaterResponse
DataRegistryRepeater.send_request() -> requests.Response | RepeaterResponse
OpenmrsRepeater.send_request() -> RepeaterResponse
Dhis2EntityRepeater.send_request() -> RepeaterResponse

We should update that docstring, in a separate PR.

a test asserts that a (SQL) record created with response=True results in state == Success (not Empty). Should that test be changed?

I guess that test can be removed in the PR that updates the docstring.

@millerdev
Copy link
Contributor Author

I am assuming this is already in process of QA and is deployed on staging?

Correct.

Copy link
Member

@dannyroberts dannyroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. For parts I couldn't follow super well, I looked for tests—and the way you paired small test changes with each commit made that easy!

@@ -1049,8 +1049,7 @@ def wrap(cls, data):
)]
return self

@property
@memoized
@cached_property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to note that cached_property and memoized + property have somewhat different semantics; if you try to directly write to a memoized property it raises an error, whereas if you try to directly write to a cached_property it will let you override the value defined by the decorated method. https://docs.python.org/3/library/functools.html#functools.cached_property

Not sure if that is disqualifying, but wanted to make sure you'd considered that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not disqualifying.


May include repeaters that have been created since the migration
started, whose records are already migrated. Also ignore records
associated with deleted repeaters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds probably fine, but to confirm, what is the behavior now of records for deleted repeaters? Are they hidden from the UI and any other places the user might be able to see that they're still in the system? Or are the effectively abandoned and never shown to the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "deleted" is referring to hard-deleted repeaters. Such repeat records cannot exist in SQL because of the foreign key relationship between (SQL)Repeater and SQLRepeatRecord. In Couch they could exist, but would be orphaned.

To answer your question more directly, it is possible to find them in the repeat record report, for example by searching by payload ID, but attempting to view responses or the payload result in an error:

Repeater with id ... could not be found

Clicking the button to Resend Payload or Requeue Payload appears to succeed, but when the record is processed it will not be forwarded because the repeater does not exist.

It seems like a bug that repeat records for deleted repeaters can be viewed at all since the information about where they were sent was deleted with the repeater.

def iter_domain_docs(domain):
return iter_docs(chunk_size, startkey=[domain], endkey=[domain, {}])
for domain in domains:
yield from iter_domain_docs(domain)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these two new methods called? Couldn't find it in the PR diff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're called by the Couch to SQL migration framework code when migrating individual domains.

doc_count = self._get_couch_doc_count_for_domains(domains)
sql_doc_count = self._get_sql_doc_count_for_domains(domains)
docs = self._iter_couch_docs_for_domains(domains, chunk_size)

# NOTE a 204 status here could mean either
# 1. the request was not sent, in which case response is
# probably RepeaterResponse(204, "No Content")
# 2. the request was sent, and the remote end responded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point (see docstring) we internally signaled that the request was not sent by passing True.

How about ...

if response is True or (
    isinstance(response, RepeaterResponse)
    and response.status_code == 204
):
    state = State.Empty
else:
    state = State.Success

... and in the future we can come up with a better, and consistent, internal signal?

return super().save(*args, **kw)

def _migration_sync_submodels_to_sql(self, sql_object):
if self._should_sync_attempts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow the paths of what calls __migration_sync_to_sql(), but I'm guessing that you are sure that this cannot be reached outside the enable_attempts_sync_to_sql() context manager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only called within the context of .save(), which uses the context manager. Having the attribute not set outside the context manager will alert us (hopefully in tests, if not then in Sentry) if any code path arises that violates that precondition.

@millerdev
Copy link
Contributor Author

Latest commits address data states discovered while running populate_repeatrecords on staging.

corehq/motech/repeaters/tests/test_couchsqlmigration.py Outdated Show resolved Hide resolved
self.assertEqual(len(obj.attempts), len(doc.attempts))
self.assertTrue(obj.attempts)

def test_migrate_record_with_unynced_sql_attempts2(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_migrate_record_with_unynced_sql_attempts2(self):
def test_migrate_record_with_unsynced_sql_attempts2(self):

What is the difference between this test and the test above it? What extra behaviour is it testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 9e7d4d3, in which I renamed the test and added a comment.

@millerdev millerdev added QA Passed and removed awaiting QA QA in progress. Do not merge labels Feb 22, 2024
@millerdev millerdev merged commit edee01c into master Feb 22, 2024
13 checks passed
@millerdev millerdev deleted the dm/sql-repeatrecord-pr1 branch February 22, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/invisible Change has no end-user visible impact QA Passed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants