Skip to content

Commit

Permalink
feat: ⚡ Update CI cron schedule, README enhancements, and refactor as…
Browse files Browse the repository at this point in the history
…set definitions

- Updated the CI cron schedule to run every 3 days for optimized resource usage.
- Enhanced the README with a link to learn more about the Datadex approach, providing additional context and value to the project documentation.
- Refactored asset definitions across various modules (`huggingface.py`, `indicators.py`, `others.py`, `spain.py`) by removing the `group_name` parameter, simplifying asset management and aligning with a more streamlined project structure.
- Updated the Spain weather data asset to fetch data since 1970 instead of 1990, significantly extending the dataset's historical reach for richer analysis.
- Pinned Dagster to version 1.7.3 in `pyproject.toml` to ensure compatibility and stability across the project's dependencies.
- Removed the meta configuration related to Dagster groups in the DBT schema files (`country_year_indicators_schema.yml` and `spain_aemet_historical_weather_schema.yml`), further aligning with the simpler asset management approach.
  • Loading branch information
davidgasquez committed May 1, 2024
1 parent 7c7f221 commit 7b273a2
Show file tree
Hide file tree
Showing 9 changed files with 15 additions and 24 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches:
- main
schedule:
- cron: "0 0 * * *"
- cron: "0 0 */3 * *"
workflow_dispatch:

jobs:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

Datadex is a fully open-source, serverless, and local-first Data Platform that improves how communities collaborate on Open Data. Datadex is not a new tool, it is a pattern showing an opinionated bridge between existing ones.

You can check other real-world production implementations of the Datadex pattern working in the following repositories:
You can [learn more about the approach in this post](https://davidgasquez.com/modern-open-data-portals/) or check other real-world production implementations of the Datadex pattern working in the following repositories:

- [Gitcoin Grants Data Portal](https://github.com/davidgasquez/gitcoin-grants-data-portal). Data hub for Gitcoin Grants data. Improves data access and empowers data scientists to conduct research and helps to guide community-driven analysis and decisions.
- [Arbitrum Grants Data Portal](https://github.com/davidgasquez/arbitrum-data-portal). Data hub for Arbitrum Grants data. Improves data access and empowers data scientists to conduct research and helps to guide community-driven analysis and decisions.
Expand Down
3 changes: 1 addition & 2 deletions datadex/assets/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
def create_hf_asset(dataset_name: str):
@asset(
name="huggingface_" + dataset_name,
ins={"data": AssetIn(dataset_name)},
group_name="huggingface",
ins={"data": AssetIn(dataset_name)}
)
def hf_asset(data: pd.DataFrame, hf: HuggingFaceResource) -> None:
"""
Expand Down
6 changes: 3 additions & 3 deletions datadex/assets/indicators.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from slugify import slugify


@asset(group_name="country_indicators", io_manager_key="polars_io_manager")
@asset(io_manager_key="polars_io_manager")
def owid_energy_data() -> pl.DataFrame:
"""
Raw Energy data from Our World in Data.
Expand All @@ -20,7 +20,7 @@ def owid_energy_data() -> pl.DataFrame:
return pl.read_csv(energy_owid_url)


@asset(group_name="country_indicators", io_manager_key="polars_io_manager")
@asset(io_manager_key="polars_io_manager")
def owid_co2_data() -> pl.DataFrame:
"""
Raw CO2 data from Our World in Data.
Expand All @@ -32,7 +32,7 @@ def owid_co2_data() -> pl.DataFrame:
return pl.read_csv(co2_owid_url)


@asset(group_name="country_indicators")
@asset()
def world_bank_wdi() -> pd.DataFrame:
"""
World Development Indicators (WDI) is the World Bank's premier compilation of cross-country comparable data on development.
Expand Down
4 changes: 2 additions & 2 deletions datadex/assets/others.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from ..resources import IUCNRedListAPI


@asset(group_name="others")
@asset()
def threatened_animal_species(iucn_redlist_api: IUCNRedListAPI) -> pd.DataFrame:
"""
Threatened animal species data from the IUCN Red List API.
Expand All @@ -27,7 +27,7 @@ def threatened_animal_species(iucn_redlist_api: IUCNRedListAPI) -> pd.DataFrame:
)


@asset(group_name="others")
@asset()
def wikidata_asteroids() -> pd.DataFrame:
"""
Wikidata asteroids data.
Expand Down
12 changes: 6 additions & 6 deletions datadex/assets/spain.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from ..resources import AEMETAPI


@asset(group_name="spain_open_data")
@asset()
def spain_energy_demand(context: AssetExecutionContext) -> pd.DataFrame:
"""
Spain energy demand data.
Expand Down Expand Up @@ -49,7 +49,7 @@ def spain_energy_demand(context: AssetExecutionContext) -> pd.DataFrame:
return df


@asset(group_name="spain_open_data")
@asset()
def spain_ipc() -> pd.DataFrame:
"""
Spain IPC data from INE. Downloaded from datos.gob.es (https://datos.gob.es/es/apidata).
Expand All @@ -73,7 +73,7 @@ def spain_ipc() -> pd.DataFrame:
return df


@asset(group_name="spain_open_data")
@asset()
def spain_aemet_stations_data(aemet_api: AEMETAPI) -> pd.DataFrame:
"""
Spain AEMET stations data.
Expand Down Expand Up @@ -106,15 +106,15 @@ def spain_aemet_stations_data(aemet_api: AEMETAPI) -> pd.DataFrame:
return df


@asset(group_name="spain_open_data")
@asset()
def spain_aemet_weather_data(
context: AssetExecutionContext, aemet_api: AEMETAPI
) -> pd.DataFrame:
"""
Spain weather data since 1990.
Spain weather data since 1970.
"""

start_date = pd.to_datetime("1990-01-01")
start_date = pd.to_datetime("1970-01-01")

end_date = datetime.now() - timedelta(days=1)

Expand Down
4 changes: 0 additions & 4 deletions dbt/models/country_year_indicators_schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,3 @@ version: 2
models:
- name: country_year_indicators
description: "Country-year indicators"
config:
meta:
dagster:
group: country_indicators
4 changes: 0 additions & 4 deletions dbt/models/spain_aemet_historical_weather_schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,3 @@ version: 2
models:
- name: spain_aemet_historical_weather
description: "Historical weather data for Spain. Cleaned and augmented with station metadata."
config:
meta:
dagster:
group: spain_open_data
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ dependencies = [
"dagster-dbt",
"dagster-duckdb-pandas",
"dagster-duckdb-polars",
"dagster",
"dagster==1.7.3",
"datasets",
"dbt-core",
"dbt-duckdb",
Expand Down

0 comments on commit 7b273a2

Please sign in to comment.