Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create QC Checks for Data Relay Server Data Ingestion #454

Open
summer-mothwood opened this issue Oct 30, 2024 · 0 comments
Open

Create QC Checks for Data Relay Server Data Ingestion #454

summer-mothwood opened this issue Oct 30, 2024 · 0 comments
Assignees

Comments

@summer-mothwood
Copy link
Contributor

As noted in #423 and based on findings in #453 , we suggest adding QC checks to the data relay server to alert the team of potential failures in the data pipeline. Some suggestions from the initial analysis include:

  • build an alert when a given district has any missing days in the S3 bucket
  • build an alert in Snowflake or dbt when the number of observations for a given station ID and/or district is lower than an expected threshold
  • build a model to check data relay data against clearinghouse data (this is a temporary fix meant to alert of us new data issue patterns that may arise, and act as a secondary validation check until we turn the clearinghouse data collection off)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants