Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Autoupdate snapshots #3799

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

🎉 Autoupdate snapshots #3799

wants to merge 15 commits into from

Conversation

Marigold
Copy link
Collaborator

@Marigold Marigold commented Jan 7, 2025

Motivation

We don't have time to keep most of our data up to date with the upstream provider. We focus on major updates that are scheduled in advance, but if the provider publishes more recent data or corrects existing data, we won’t know.

On the other end of the spectrum are automatic daily updates that are not reviewed and can fail when the provider's link stops working (or, even worse, returns bad data, which has happened recently).

The idea is to have something that sits between fully automatic unreviewed updates and fully manual updates. If a snapshot uses autoupdate: true in metadata, a scheduled script would try to run the snapshot every day, and if the data is different, it would create a pull request that could be reviewed and merged with one click.

This could be used right away for datasets that are updated monthly by Veronika, i.e., Epoch AI or surface temperature. Another use case could be detecting datasets with potential updates, which could be picked up and completed manually if the update is worth doing.

Notes

  • We don’t create new versions for updated datasets, only change the data_accessed attribute. Updating the version would make it much more complex (and likely not worth it unless a code change is needed).
  • I tried running all snapshots, and there are around 60 potential updates.

TODO before merging

  • Undo changes to snapshots, this was done only to check how many can be updated.
  • Close all hanging automatically created PRs.

@owidbot
Copy link
Contributor

owidbot commented Jan 7, 2025

Quick links (staging server):

Site Dev Site Preview Admin Wizard Docs

Login: ssh owid@staging-site-snapshots-autoupdate

chart-diff: ✅ No charts for review.
data-diff: ❌ Found differences
= Dataset garden/democracy/2024-03-07/eiu
  = Table eiu
    ~ Dim country
-       - Removed values: 19 / 2765 (0.69%)
           year country
           2023 Belarus
           2023    Laos
           2023 Myanmar
           2023   Syria
           2023   Yemen
    ~ Dim year
-       - Removed values: 19 / 2765 (0.69%)
          country  year
          Belarus  2023
             Laos  2023
          Myanmar  2023
            Syria  2023
            Yemen  2023
    ~ Column civlib_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  civlib_eiu
          Belarus  2023        1.47
             Laos  2023        0.29
          Myanmar  2023         0.0
            Syria  2023         0.0
            Yemen  2023        0.88
    ~ Column dem_culture_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  dem_culture_eiu
          Belarus  2023             4.38
             Laos  2023             3.75
          Myanmar  2023             3.13
            Syria  2023             4.38
            Yemen  2023              5.0
    ~ Column democracy_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  democracy_eiu
          Belarus  2023           1.99
             Laos  2023           1.71
          Myanmar  2023           0.85
            Syria  2023           1.43
            Yemen  2023           1.95
        ~ Changed values: 4 / 2765 (0.14%)
          country  year  democracy_eiu -  democracy_eiu +
           Africa  2023           3.9674         4.335116
             Asia  2023         4.164255         4.963333
           Europe  2023          7.39775         7.536411
            World  2023         5.225329         5.686757
    ~ Column elect_freefair_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  elect_freefair_eiu
          Belarus  2023                 0.0
             Laos  2023                 0.0
          Myanmar  2023                 0.0
            Syria  2023                 0.0
            Yemen  2023                 0.0
    ~ Column funct_gov_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  funct_gov_eiu
          Belarus  2023           0.79
             Laos  2023           2.86
          Myanmar  2023            0.0
            Syria  2023            0.0
            Yemen  2023            0.0
    ~ Column pol_part_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year  pol_part_eiu
          Belarus  2023          3.33
             Laos  2023          1.67
          Myanmar  2023          1.11
            Syria  2023          2.78
            Yemen  2023          3.89
    ~ Column regime_eiu (changed data)
-       - Removed values: 19 / 2765 (0.69%)
          country  year regime_eiu
          Belarus  2023          0
             Laos  2023          0
          Myanmar  2023          0
            Syria  2023          0
            Yemen  2023          0
  = Table num_countries
    ~ Column num_regime_eiu (changed data)
        ~ Changed values: 4 / 448 (0.89%)
          country  year             category  num_regime_eiu -  num_regime_eiu +
           Africa  2023 authoritarian regime                26                19
             Asia  2023 authoritarian regime                27                16
           Europe  2023 authoritarian regime                 2                 1
            World  2023 authoritarian regime                59                40
  = Table num_people
  = Table avg_pop
    ~ Column democracy_eiu_weighted (changed data)
        ~ Changed values: 4 / 112 (3.57%)
          country  year  democracy_eiu_weighted -  democracy_eiu_weighted +
           Africa  2023                   3.88442                  4.213404
             Asia  2023                  4.666835                  4.929935
           Europe  2023                  6.657641                  6.718221
            World  2023                  4.978599                  5.235252
= Dataset garden/demography/2024-12-03/fertility_rate
  = Table fertility_rate
  = Table fertility_rate_by_age
= Dataset garden/demography/2024-12-18/mean_age_childbearing
  = Table mean_age_childbearing
= Dataset garden/gapminder/2023-09-22/total_fertility_rate
  = Table fertility_rate
    ~ Column children_dying_before_five_per_woman (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^
    ~ Column children_surviving_past_five_per_woman (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^
= Dataset garden/hmd/2024-11-19/hfd
- - Table cohort_share_women
-   - Column share_women
  = Table period_ages
  = Table period
  = Table period_ages_years
  = Table cohort_ages
  = Table cohort
  = Table cohort_ages_years
= Dataset garden/un/2024-09-16/long_run_child_mortality
  = Table long_run_child_mortality_selected
    ~ Column under_five_mortality (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^
  = Table long_run_child_mortality
    ~ Column share_dying_first_five_years (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^
    ~ Column share_surviving_first_five_years (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^
    ~ Column under_five_mortality (changed metadata)
-       -   attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^ ^^  ^^^^^^^^^^^^^^
+       +   attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
        ?                ^^^^^^^^ ^ + ^


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2025-01-08 15:00:49 UTC
Execution time: 22.90 seconds

@Marigold Marigold force-pushed the snapshots-autoupdate branch 2 times, most recently from 21a594a to 819c24d Compare January 10, 2025 11:52
@Marigold Marigold force-pushed the snapshots-autoupdate branch from 3d1e7a0 to 37e0da4 Compare January 13, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants