Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ntd api tables #3546

Merged
merged 7 commits into from
Nov 21, 2024
Merged

Rename ntd api tables #3546

merged 7 commits into from
Nov 21, 2024

Conversation

charlie-costanzo
Copy link
Member

@charlie-costanzo charlie-costanzo commented Nov 13, 2024

Description

Recently, we created a new data pipeline to ingest NTD annual reporting data from the DOT data portal API. The data portal API is new, so the only annual reporting data that existed within it at the time was for the year 2022.

Documentation showed no indication that future years' data would be appended to the same endpoint, so it was assumed that future years' data endpoints would be added as new Airflow DAG tasks. When 2023 reporting data was released, DOT appended the data to the 2022 data endpoint and adjusted the documentation to reflect data from the years 2022-2023.

When ingesting the initial 2022 endpoints, the naming conventions used were very 2022-centric. This PR replaces the naming conventions used in the data ingestion and external, source, and staging tables to be year-agnostic. It also replaces the tables referenced in mart tables to utilize the new table names.

Resolves #3523

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this been tested?

locally with dbt
Screenshot 2024-11-18 at 9 35 12 PM

Post-merge follow-ups

  • Actions required (specified below)
    observe post-merge for expected behavior, change any outstanding/new models to use these new schemas and communicate out new schema.

Copy link

github-actions bot commented Nov 13, 2024

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__breakdowns

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__breakdowns_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__capital_expenses_by_capital_use

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__capital_expenses_by_mode

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__capital_expenses_for_existing_service

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__capital_expenses_for_expansion_of_service

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__employees_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__employees_by_mode

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__employees_by_mode_and_employee_type

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__fuel_and_energy

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__fuel_and_energy_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_by_expense_type

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_directly_generated

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_federal

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_local

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_state

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__funding_sources_taxes_levied_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__maintenance_facilities

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__maintenance_facilities_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__metrics

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__operating_expenses_by_function

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__operating_expenses_by_function_and_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__operating_expenses_by_type

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__operating_expenses_by_type_and_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__service_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__service_by_mode

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__service_by_mode_and_time_period

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__stations_and_facilities_by_agency_and_facility_type

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__stations_by_mode_and_age

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__track_and_roadway_by_agency

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__track_and_roadway_by_mode

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__track_and_roadway_guideway_age_distribution

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__vehicles_age_distribution

calitp_warehouse.staging.ntd_annual_data_tables.stg_ntd_annual_data__vehicles_type_count_by_agency

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@charlie-costanzo charlie-costanzo self-assigned this Nov 14, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Github got confused here. Can you check if metrics table still exists after this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it.
It is here, but inverted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nicer when it points as a replaced by though. Helps when you need to find the history, but since these models are new and not much to show before that, just having the old removed and a new add sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah it was such a bulk action and I generated all the new stuff with scripts so it was way easier to delete and add but that definitely is not as nice as when it points as replace by, sorry about that. hopefully because it is new it's not too much of an issue

@charlie-costanzo charlie-costanzo marked this pull request as ready for review November 14, 2024 21:38
@charlie-costanzo charlie-costanzo marked this pull request as draft November 14, 2024 21:39
@charlie-costanzo charlie-costanzo force-pushed the rename-ntd-api-tables branch 2 times, most recently from 4a538a4 to 282b6a2 Compare November 19, 2024 02:30
@charlie-costanzo charlie-costanzo marked this pull request as ready for review November 19, 2024 02:36
@erikamov erikamov self-requested a review November 21, 2024 19:45
Copy link
Contributor

@erikamov erikamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for renaming it all!

@charlie-costanzo charlie-costanzo merged commit c280e48 into main Nov 21, 2024
4 checks passed
@charlie-costanzo charlie-costanzo deleted the rename-ntd-api-tables branch November 21, 2024 21:13
charlie-costanzo added a commit that referenced this pull request Nov 22, 2024
* make ntd table API sync for annual data year-agnostic

* revise sync dag ymls

* rename paths and destination tables in external table config, rename files

* replace all staging tables with new year-agnostic naming

* replace the staging tables referenced in ntd mart tables with new year-agnostic naming convention

* revise testing for year values to include the year 2023

* revise ntd annual reporting mart table documentation to no longer include references to the year 2022 specifically
@charlie-costanzo charlie-costanzo added the data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner. label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename 2022 NTD Annual Reporting Models since they now include 2023 data
2 participants