-
Notifications
You must be signed in to change notification settings - Fork 53
Architectural Decision Records
Date: 06/2022
Status: Accepted
Additional discussion available here
As of February 2022, SimpleReport was a COVID-19 specific tool. In order to become more flexible, we looked to expand SimpleReport’s capabilities to start recording Flu results. We wanted to support the growing number of SimpleReport users testing on devices that test for COVID-19, Flu A & B with a single patient sample (multiplex).
Over 65,000 tests were recorded in SimpleReport using these devices as of February 2022. We wanted users to be able to report flu results from these devices in tandem with COVID-19 results for three reasons:
- To better support users as they track disease outbreaks in their facilities and organizations.
- To provide a proof-of-concept to the CDC and other public health partners that flu data can be collected and is useful.
- To begin the engineering work required to add additional diseases.
For this iteration of multiplex implementation, we did not focus on how to report these results to public health departments through ReportStream, as many departments do not want this data yet. Instead, we focused on how to collect and store these results within the SimpleReport system. We believe this will provide value to our end users by allowing them to review the test results and provide a more complete test result to the patient via SimpleReport’s SMS and SMTP test result delivery.
-
The “results” data schema has the following pieces:
- A mutable TestOrder object with correction/removal status representing “a test for a person at a time has occurred”.
- An immutable TestEvent object that we send to ReportStream representing “something related to testing has happened in SR”. If a TestOrder is corrected or removed, another TestEvent with the correction/removal flag is created to capture the change.
-
Previous to multiplex, we tracked test outcomes as columns in the TestEvent/TestOrder tables. To implement multiplex, we replaced these columns with a Results table joined to the TestOrder and TestEvent tables to maintain the mapping of Result <> TestOrder <> TestEvent. The Results table has a "Test Result" column taking one of "positive"/"inconclusive"/"negative", similar to the columns that existed previously in the TestOrder/Event tables.
-
For previous results that weren't corrected or removed, we backfilled the Results table with entries that copied their respective results’ outcomes with joins to the existing TestOrder/Events rows. These entries also got new columns like disease ID needed for multiplex.
-
For each corrected and/or removed test, we made new TestEvent entities. Each new TestEvent was mapped to a new Result entity. Both new entities were mapped back to the original TestOrder that was updated with correction/removal status.
- In this process, the old TestEvents made prior to the TestOrder update are no longer associated with any Result entities. Therefore, we made new Result entities to map the old TestEvents back to the relevant TestOrder and to ensure each TestEvent has a Result.
- In other words, TestOrders are one-to-many on Results. Results are one-to-one on TestEvents. TestEvents are many-to-one on TestOrders 🙃.
The multiplex implementation has enabled SimpleReport to begin accepting Flu A and B results, as well as laying the groundwork for easier addition of future diseases. These changes have surfaced the potential need to refactor the Results flow given the complicated workarounds to TestEvents/Orders that were needed to make multiplex work.
Date: 02/2022
Status: Accepted
During the Omicron surge in late 2021 through the beginning of 2022, the volume of daily tests SimpleReport processed increased exponentially. The system frequently alerted on database connection exhaustion errors, as there were not enough available connections to process the number of simultaneous users. Initial remediation attempts included changing the number of connections requested by the application's connection pool, along with increasing the number of application replicas present at a given point in time. These solutions proved temporary, however, as SimpleReport's rate of adoption continued to outstrip its available capacity. A more permanent solution was established by changing the SKU of our existing database to increase the fixed number of available connections, along with the available DB-specific compute resources. This still did not solve our issues with connection pool sizing, and required careful management of application replicas to prevent inadvertent connection starvation. The issue resurfaced as attempts were made to migrate the audit log functionality away from the database, and to Splunk, a third-party log analytics provider.
Ultimately, the team decided to move to the Azure Database for PostgreSQL - Flexible Server product.
Movement to the Flexible Server SKU would provide a number of advantages, many of which are covered in this documentation. Specifically, the following provide the greatest impact to SimpleReport:
- High availability of database instances, ensuring minimal downtime in the event of a datacenter or machine outage.
- Automated patching with a managed maintenance window, ensuring that Azure does not attempt to perform maintenance of our DB during peak SR usage hours
- Rapid scaling and performance management, enabling rapid response to customer demand
- The integration of PgBouncer connection pooling, which increases resource efficiency and minimizes the need for manual connection management
The Flexible DB rollout has helped reduce response time according to the Azure metrics.
- Getting Started
- [Setup] Docker and docker compose development
- [Setup] IntelliJ run configurations
- [Setup] Running DB outside of Docker (optional)
- [Setup] Running nginx locally (optional)
- [Setup] Running outside of docker
- Accessing and testing weird parts of the app on local dev
- Accessing patient experience in local dev
- API Testing with Insomnia
- Cypress
- How to run e2e locally for development
- E2E tests
- Database maintenance
- MailHog
- Running tests
- SendGrid
- Setting up okta
- Sonar
- Storybook and Chromatic
- Twilio
- User roles
- Wiremock
- CSV Uploader
- Log local DB queries
- Code review and PR conventions
- SimpleReport Style Guide
- How to Review and Test Pull Requests for Dependabot
- How to Review and Test Pull Requests with Terraform Changes
- SimpleReport Deployment Process
- Adding a Developer
- Removing a developer
- Non-deterministic test tracker
- Alert Response - When You Know What is Wrong
- What to Do When You Have No Idea What is Wrong
- Main Branch Status
- Maintenance Mode
- Swapping Slots
- Monitoring
- Container Debugging
- Debugging the ReportStream Uploader
- Renew Azure Service Principal Credentials
- Releasing Changelog Locks
- Muting Alerts
- Architectural Decision Records
- Backend Stack Overview
- Frontend Overview
- Cloud Architecture
- Cloud Environments
- Database ERD
- External IDs
- GraphQL Flow
- Hibernate Lazy fetching and nested models
- Identity Verification (Experian)
- Spring Profile Management
- SR Result bulk uploader device validation logic
- Test Metadata and how we store it
- TestOrder vs TestEvent
- ReportStream Integration
- Feature Flag Setup
- FHIR Resources
- FHIR Conversions
- Okta E2E Integration
- Deploy Application Action
- Slack notifications for support escalations
- Creating a New Environment Within a Resource Group
- How to Add and Use Environment Variables in Azure
- Web Application Firewall (WAF) Troubleshooting and Maintenance
- How to Review and Test Pull Requests with Terraform Changes