Skip to content

Architectural Decision Records

fzhao99 edited this page Jun 21, 2022 · 17 revisions

Multiplex changes for Flu A and B

Date: 06/2022

Status: Accepted

Additional discussion available here

Context

As of February 2022, SimpleReport was a COVID-19 specific tool. In order to become more flexible, we looked to expand SimpleReport’s capabilities to start recording Flu results. We wanted to support the growing number of SimpleReport users testing on devices that test for COVID-19, Flu A & B with a single patient sample (multiplex).

Over 65,000 tests were recorded in SimpleReport using these devices as of February 2022. We wanted users to be able to report flu results from these devices in tandem with COVID-19 results for three reasons:

  • To better support users as they track disease outbreaks in their facilities and organizations.
  • To provide a proof-of-concept to the CDC and other public health partners that flu data can be collected and is useful.
  • To begin the engineering work required to add additional diseases.

Decision

For this iteration of multiplex implementation, we did not focus on how to report these results to public health departments through ReportStream, as many departments do not want this data yet. Instead, we focused on how to collect and store these results within the SimpleReport system. We believe this will provide value to our end users by allowing them to review the test results and provide a more complete test result to the patient via SimpleReport’s SMS and SMTP test result delivery.

  • The “results” data schema has the following pieces:

    • A mutable TestOrder object with correction/removal status representing “a test for a person at a time has occurred”.
    • An immutable TestEvent object that we send to ReportStream representing “something related to testing has happened in SR”. If a TestOrder is corrected or removed, another TestEvent with the correction/removal flag is created to capture the change.
  • Previous to multiplex, we tracked test outcomes as columns in the TestEvent/TestOrder tables. To implement multiplex, we replaced these columns with a Results table joined to the TestOrder and TestEvent tables to maintain the mapping of Result <> TestOrder <> TestEvent. The Results table has a "Test Result" column taking one of "positive"/"inconclusive"/"negative", similar to the columns that existed previously in the TestOrder/Event tables.

  • For previous results that weren't corrected or removed, we backfilled the Results table with entries that copied their respective results’ outcomes with joins to the existing TestOrder/Events rows. These entries also got new columns like disease ID needed for multiplex.

  • For each corrected and/or removed test, we made new TestEvent entities. Each new TestEvent was mapped to a new Result entity. Both new entities were mapped back to the original TestOrder that was updated with correction/removal status.

    • In this process, the old TestEvents made prior to the TestOrder update are no longer associated with any Result entities. Therefore, we made new Result entities to map the old TestEvents back to the relevant TestOrder and to ensure each TestEvent has a Result.
    • In other words, TestOrders are one-to-many on Results. Results are one-to-one on TestEvents. TestEvents are many-to-one on TestOrders 🙃.

Consequences

The multiplex implementation has enabled SimpleReport to begin accepting Flu A and B results, as well as laying the groundwork for easier addition of future diseases. These changes have surfaced the potential need to refactor the Results flow given the complicated workarounds to TestEvents/Orders that were needed to make multiplex work.

Flexible Database Migration

Date: 02/2022

Status: Accepted

Context

During the Omicron surge in late 2021 through the beginning of 2022, the volume of daily tests SimpleReport processed increased exponentially. The system frequently alerted on database connection exhaustion errors, as there were not enough available connections to process the number of simultaneous users. Initial remediation attempts included changing the number of connections requested by the application's connection pool, along with increasing the number of application replicas present at a given point in time. These solutions proved temporary, however, as SimpleReport's rate of adoption continued to outstrip its available capacity. A more permanent solution was established by changing the SKU of our existing database to increase the fixed number of available connections, along with the available DB-specific compute resources. This still did not solve our issues with connection pool sizing, and required careful management of application replicas to prevent inadvertent connection starvation. The issue resurfaced as attempts were made to migrate the audit log functionality away from the database, and to Splunk, a third-party log analytics provider.

Decision

Ultimately, the team decided to move to the Azure Database for PostgreSQL - Flexible Server product.

Movement to the Flexible Server SKU would provide a number of advantages, many of which are covered in this documentation. Specifically, the following provide the greatest impact to SimpleReport:

  • High availability of database instances, ensuring minimal downtime in the event of a datacenter or machine outage.
  • Automated patching with a managed maintenance window, ensuring that Azure does not attempt to perform maintenance of our DB during peak SR usage hours
  • Rapid scaling and performance management, enabling rapid response to customer demand
  • The integration of PgBouncer connection pooling, which increases resource efficiency and minimizes the need for manual connection management

Consequences

The Flexible DB rollout has helped reduce response time according to the Azure metrics.

Local development

Setup

How to

Development process and standards

Oncall

Technical resources

How-to guides

Environments/Azure

Misc

?

Clone this wiki locally