Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to v5.0.0 of the schedule validator #3314

Merged
merged 8 commits into from
Mar 26, 2024

Conversation

vevetron
Copy link
Contributor

@vevetron vevetron commented Mar 25, 2024

Description

Updates to v5.0.0 of the schedule validator. Though the documentation here mentions breaking changes, from my investigation I don't believe it will affect any of our codebase so I simply mirrored a previous pr.

Resolves #3305

Type of change

  • New feature

How has this been tested?

docker-compose run airflow tasks test download_gtfs_schedule_v2 download_schedule_feeds 2024-03-22T00:00:00

docker-compose run airflow tasks test create_external_tables validation_notices 2024-03-22T18:00:00

Did manage to create new tables at cal-itp-data-infra-staging.external_gtfs_schedule.validation_notices and see v5.0.0 version with this query:

SELECT distinct(metadata.gtfs_validator_version) FROM cal-itp-data-infra-staging.external_gtfs_schedule.validation_notices LIMIT 1000

Did not test with DBT yet.

Post-merge follow-ups

  • Actions required (specified below)

Check if we see v5 validation errors in a few days.

Make sure the new dockerfile image gets pushed to production.

@vevetron vevetron linked an issue Mar 25, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Mar 25, 2024

Warehouse report 📦

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@vevetron
Copy link
Contributor Author

I initially used a past date but changed to a future date, by @SorenSpicknall 's logic:

Because we start utilizing the newest validator at an arbitrary date following its release, we typically set the targeted date for the switchover to a future date at the time of merge, rather than backdating to the release date for the validator version. This is because, by the time we move to the new validator version, validation has already taken place for previous dates under the prior version we had in use.

If we applied a new validator version to past dates we could rerun the full ground-up data validation flow for those dates, but that could change outcomes that may have already been surfaced to other parties who lack context for the exact validation logic (in monthly reports, quality communications to agencies, etc.). Philosophically, we prefer to mess with old data as little as possible - our models, our evaluation outcomes, and so on should capture how things were being evaluated at the time that data came into the warehouse, so that we can accurately measure changes in approach through time and avoid having to rerun updated logic on older data holdings. The only significant exception to that is when we need to change modeling to enhance old raw data with a new field or a new mapping to other data.

Copy link
Contributor

@SorenSpicknall SorenSpicknall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively approving because suggested changes are not logic-breaking

jobs/gtfs-schedule-validator/README.md Outdated Show resolved Hide resolved
jobs/gtfs-schedule-validator/Dockerfile Outdated Show resolved Hide resolved
jobs/gtfs-schedule-validator/README.md Outdated Show resolved Hide resolved
@vevetron vevetron merged commit 39ba4b5 into main Mar 26, 2024
7 checks passed
@vevetron vevetron deleted the vb-3305-gtfs-schedule-validator-v50-update branch March 26, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GTFS Schedule Validator v5.0 update
3 participants