-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Autoupdate snapshots #3799
Open
Marigold
wants to merge
15
commits into
master
Choose a base branch
from
snapshots-autoupdate
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
🎉 Autoupdate snapshots #3799
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Quick links (staging server):
Login: chart-diff: ✅No charts for review.data-diff: ❌ Found differences= Dataset garden/democracy/2024-03-07/eiu
= Table eiu
~ Dim country
- - Removed values: 19 / 2765 (0.69%)
year country
2023 Belarus
2023 Laos
2023 Myanmar
2023 Syria
2023 Yemen
~ Dim year
- - Removed values: 19 / 2765 (0.69%)
country year
Belarus 2023
Laos 2023
Myanmar 2023
Syria 2023
Yemen 2023
~ Column civlib_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year civlib_eiu
Belarus 2023 1.47
Laos 2023 0.29
Myanmar 2023 0.0
Syria 2023 0.0
Yemen 2023 0.88
~ Column dem_culture_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year dem_culture_eiu
Belarus 2023 4.38
Laos 2023 3.75
Myanmar 2023 3.13
Syria 2023 4.38
Yemen 2023 5.0
~ Column democracy_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year democracy_eiu
Belarus 2023 1.99
Laos 2023 1.71
Myanmar 2023 0.85
Syria 2023 1.43
Yemen 2023 1.95
~ Changed values: 4 / 2765 (0.14%)
country year democracy_eiu - democracy_eiu +
Africa 2023 3.9674 4.335116
Asia 2023 4.164255 4.963333
Europe 2023 7.39775 7.536411
World 2023 5.225329 5.686757
~ Column elect_freefair_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year elect_freefair_eiu
Belarus 2023 0.0
Laos 2023 0.0
Myanmar 2023 0.0
Syria 2023 0.0
Yemen 2023 0.0
~ Column funct_gov_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year funct_gov_eiu
Belarus 2023 0.79
Laos 2023 2.86
Myanmar 2023 0.0
Syria 2023 0.0
Yemen 2023 0.0
~ Column pol_part_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year pol_part_eiu
Belarus 2023 3.33
Laos 2023 1.67
Myanmar 2023 1.11
Syria 2023 2.78
Yemen 2023 3.89
~ Column regime_eiu (changed data)
- - Removed values: 19 / 2765 (0.69%)
country year regime_eiu
Belarus 2023 0
Laos 2023 0
Myanmar 2023 0
Syria 2023 0
Yemen 2023 0
= Table num_countries
~ Column num_regime_eiu (changed data)
~ Changed values: 4 / 448 (0.89%)
country year category num_regime_eiu - num_regime_eiu +
Africa 2023 authoritarian regime 26 19
Asia 2023 authoritarian regime 27 16
Europe 2023 authoritarian regime 2 1
World 2023 authoritarian regime 59 40
= Table num_people
= Table avg_pop
~ Column democracy_eiu_weighted (changed data)
~ Changed values: 4 / 112 (3.57%)
country year democracy_eiu_weighted - democracy_eiu_weighted +
Africa 2023 3.88442 4.213404
Asia 2023 4.666835 4.929935
Europe 2023 6.657641 6.718221
World 2023 4.978599 5.235252
= Dataset garden/demography/2024-12-03/fertility_rate
= Table fertility_rate
= Table fertility_rate_by_age
= Dataset garden/demography/2024-12-18/mean_age_childbearing
= Table mean_age_childbearing
= Dataset garden/gapminder/2023-09-22/total_fertility_rate
= Table fertility_rate
~ Column children_dying_before_five_per_woman (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
~ Column children_surviving_past_five_per_woman (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
= Dataset garden/hmd/2024-11-19/hfd
- - Table cohort_share_women
- - Column share_women
= Table period_ages
= Table period
= Table period_ages_years
= Table cohort_ages
= Table cohort
= Table cohort_ages_years
= Dataset garden/un/2024-09-16/long_run_child_mortality
= Table long_run_child_mortality_selected
~ Column under_five_mortality (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
= Table long_run_child_mortality
~ Column share_dying_first_five_years (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
~ Column share_surviving_first_five_years (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
~ Column under_five_mortality (changed metadata)
- - attribution: Gapminder (2020); UN Inter-agency Group for Child Mortality Estimation (2024)
? ^ ^^ ^^^^^^^^^^^^^^
+ + attribution: United Nations Inter-agency Group for Child Mortality Estimation (2024)
? ^^^^^^^^ ^ + ^
Legend: +New ~Modified -Removed =Identical Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included Edited: 2025-01-08 15:00:49 UTC |
Marigold
force-pushed
the
snapshots-autoupdate
branch
2 times, most recently
from
January 10, 2025 11:52
21a594a
to
819c24d
Compare
Marigold
force-pushed
the
snapshots-autoupdate
branch
from
January 13, 2025 08:38
3d1e7a0
to
37e0da4
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
We don't have time to keep most of our data up to date with the upstream provider. We focus on major updates that are scheduled in advance, but if the provider publishes more recent data or corrects existing data, we won’t know.
On the other end of the spectrum are automatic daily updates that are not reviewed and can fail when the provider's link stops working (or, even worse, returns bad data, which has happened recently).
The idea is to have something that sits between fully automatic unreviewed updates and fully manual updates. If a snapshot uses
autoupdate: true
in metadata, a scheduled script would try to run the snapshot every day, and if the data is different, it would create a pull request that could be reviewed and merged with one click.This could be used right away for datasets that are updated monthly by Veronika, i.e., Epoch AI or surface temperature. Another use case could be detecting datasets with potential updates, which could be picked up and completed manually if the update is worth doing.
Notes
data_accessed
attribute. Updating the version would make it much more complex (and likely not worth it unless a code change is needed).TODO before merging