Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wizard: fixes #1567

Merged
merged 93 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
72121f4
docs
lucasrodes Aug 24, 2023
45f17b5
import
lucasrodes Aug 24, 2023
1d74712
migrate
lucasrodes Aug 24, 2023
220e496
config
lucasrodes Aug 24, 2023
9f72ea5
remove walkthrough from make
lucasrodes Aug 24, 2023
e3cffa2
move
lucasrodes Aug 24, 2023
040367f
udpate readme
lucasrodes Aug 24, 2023
36348cb
imports
lucasrodes Aug 24, 2023
ecb8ff3
config
lucasrodes Aug 24, 2023
35929b3
migrate
lucasrodes Aug 24, 2023
25d7cae
move folder
lucasrodes Aug 24, 2023
80a3544
imports
lucasrodes Aug 24, 2023
5fbc388
docs
lucasrodes Aug 24, 2023
445ee1e
config
lucasrodes Aug 24, 2023
604a7e5
lint
lucasrodes Aug 24, 2023
ecae5b2
Merge branch 'master' of https://github.com/owid/etl into enhance/rep…
lucasrodes Aug 24, 2023
56e5b0b
fix test
lucasrodes Aug 24, 2023
9949000
tests
lucasrodes Aug 24, 2023
046989a
tests
lucasrodes Aug 24, 2023
bfcb39d
typo
lucasrodes Aug 24, 2023
268ab63
:tada: add helper function for reading JSON schema
Marigold Aug 25, 2023
23c8d37
Merge branch 'master' into enhance/walkthrough-2
lucasrodes Aug 25, 2023
7f858a9
Merge branch 'read-json-schema' into enhance/walkthrough-2
lucasrodes Aug 25, 2023
3284d73
add titles, descriptions, requirement_level
lucasrodes Aug 25, 2023
796c36e
add required fields
lucasrodes Aug 25, 2023
6ca9d86
Merge branch 'master' into enhance/walkthrough-2
lucasrodes Aug 25, 2023
2d0dd96
add path to schema
lucasrodes Aug 25, 2023
7ad142d
draft
lucasrodes Aug 25, 2023
4e1c12e
re-order fields
lucasrodes Aug 25, 2023
d6700f5
remove print
lucasrodes Aug 25, 2023
6a35ecb
tests
lucasrodes Aug 25, 2023
ffdaec0
lint
lucasrodes Aug 25, 2023
75c0a8b
ignore when formatting
lucasrodes Aug 25, 2023
07908bd
wizard app
lucasrodes Aug 25, 2023
24dc31e
ignore wizard files
lucasrodes Aug 25, 2023
d29cb74
remove pywebio
lucasrodes Aug 25, 2023
e938cc0
lint
lucasrodes Aug 25, 2023
e2eb803
lint
lucasrodes Aug 25, 2023
b6b76dd
enhance schema
lucasrodes Aug 25, 2023
e1447e8
chore
lucasrodes Aug 26, 2023
faae21e
default date values
lucasrodes Aug 26, 2023
3b1ca4e
meadow step
lucasrodes Aug 26, 2023
848b124
add main cli
lucasrodes Aug 26, 2023
407b847
meadow step
lucasrodes Aug 26, 2023
16a6d9b
wip
lucasrodes Aug 27, 2023
d4997ac
wip
lucasrodes Aug 27, 2023
e8b1243
dependencies
lucasrodes Aug 28, 2023
48a2632
validate & co
lucasrodes Aug 28, 2023
6cf693c
simplifyy
lucasrodes Aug 28, 2023
25a64be
wip
lucasrodes Aug 28, 2023
977f79c
update streamlit version
lucasrodes Aug 28, 2023
9210f28
enhance validation
lucasrodes Aug 28, 2023
f06d0dd
cmd cli
lucasrodes Aug 28, 2023
1878731
add charts
lucasrodes Aug 28, 2023
a08b6da
add charts, phase option being used
lucasrodes Aug 28, 2023
a4454ac
remove header
lucasrodes Aug 28, 2023
6e50352
lint
lucasrodes Aug 28, 2023
86f039a
default values
lucasrodes Aug 28, 2023
4b063ec
condition to run check
lucasrodes Aug 28, 2023
8a37b6c
add dummy data to wizard
lucasrodes Aug 28, 2023
0c5147c
fix links
lucasrodes Aug 28, 2023
c2e377b
remove spurious file
lucasrodes Aug 29, 2023
e81fb07
fixes in charts
lucasrodes Aug 29, 2023
4d0b21c
add test to db
lucasrodes Aug 29, 2023
228acd3
re-structure entry page
lucasrodes Aug 29, 2023
ba153c9
lint
lucasrodes Aug 29, 2023
c2784fb
missing indentation, add validation
lucasrodes Aug 29, 2023
c0157f8
indentation
lucasrodes Aug 29, 2023
de42896
lint (wip)
lucasrodes Aug 29, 2023
7d7c967
lint (wip)
lucasrodes Aug 29, 2023
6f49efb
chore: typing_extensions
lucasrodes Aug 29, 2023
97c6c53
🐛 walkthrough: add to dag path
lucasrodes Aug 29, 2023
d6bcfb6
Merge branch 'fix/add-to-dag' into enhance/walkthrough-2
lucasrodes Aug 29, 2023
8b1df46
Merge branch 'master' into enhance/walkthrough-2
lucasrodes Aug 29, 2023
2233b5a
no required field in grapher_config
lucasrodes Aug 29, 2023
a4ad968
wip
lucasrodes Aug 30, 2023
4c0767d
move modules
lucasrodes Aug 30, 2023
a606023
minor tweaks
lucasrodes Aug 30, 2023
20123b8
change paths
lucasrodes Aug 30, 2023
e52cc0a
render guideliens
lucasrodes Aug 30, 2023
6111195
nested list structure
lucasrodes Aug 30, 2023
1de01c8
Merge branch 'master' into enhance/walkthrough-2
lucasrodes Aug 30, 2023
0c56823
add guidelines
lucasrodes Aug 30, 2023
628c15d
fix paths
lucasrodes Aug 30, 2023
a71bc5f
indent in dag
lucasrodes Aug 30, 2023
cd10c23
Merge branch 'master' into enhance/walkthrough-2
lucasrodes Sep 1, 2023
28351cc
adapt dag io functions
lucasrodes Sep 1, 2023
2e4b00e
enhance: default values from config
lucasrodes Sep 1, 2023
8ba6699
enhance: default values from config
lucasrodes Sep 1, 2023
4d833cc
lint
lucasrodes Sep 1, 2023
7469a6b
lint
lucasrodes Sep 1, 2023
539cbbd
set license also at root level
lucasrodes Sep 1, 2023
8e4479b
try removing warning
lucasrodes Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,5 @@ docs/.vscode
site/

.sanity-check

.wizard
4 changes: 4 additions & 0 deletions apps/wizard/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

python cli.py
"""
import logging
import sys
from typing import Iterable

Expand All @@ -13,6 +14,9 @@

from apps.wizard.utils import CURRENT_DIR, PHASES

# Disable streamlit cache data API logging
# ref: @kajarenc from https://github.com/streamlit/streamlit/issues/6620#issuecomment-1564735996
logging.getLogger("streamlit.runtime.caching.cache_data_api").setLevel(logging.ERROR)

# NOTE: Any new arguments here need to be in sync with the arguments defined in
# wizard.utils.APP_STATE.args property method
Expand Down
11 changes: 8 additions & 3 deletions apps/wizard/templating/garden.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ def load_instructions() -> str:
class GardenForm(utils.StepForm):
"""Garden step form."""

step_name: str = "garden"

short_name: str
namespace: str
version: str
Expand Down Expand Up @@ -235,13 +237,13 @@ def _fill_dummy_metadata_yaml(metadata_path: Path) -> None:
APP_STATE.st_widget(
st.toggle,
label="Generate playground notebook",
key="garden.generate_notebook",
default_last=True,
key="generate_notebook",
default_last=False,
)
APP_STATE.st_widget(
st.toggle,
label="Make dataset private",
key="garden.is_private",
key="is_private",
default_last=False,
)

Expand Down Expand Up @@ -362,6 +364,9 @@ def _fill_dummy_metadata_yaml(metadata_path: Path) -> None:

# User message
st.toast("Templates generated. Read the next steps.", icon="✅")

# Update config
utils.update_wizard_config(form=form)
else:
st.write(form.errors)
st.error("Form not submitted! Check errors!")
5 changes: 5 additions & 0 deletions apps/wizard/templating/grapher.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ def load_instructions() -> str:
class GrapherForm(utils.StepForm):
"""Grapher form."""

step_name: str = "grapher"

short_name: str
namespace: str
version: str
Expand Down Expand Up @@ -269,6 +271,9 @@ def update_state() -> None:

# User message
st.toast("Templates generated. Read the next steps.", icon="✅")

# Update config
utils.update_wizard_config(form=form)
else:
st.write(form.errors)
st.error("Form not submitted! Check errors!")
10 changes: 7 additions & 3 deletions apps/wizard/templating/meadow.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ def load_instructions() -> str:
class MeadowForm(utils.StepForm):
"""Meadow step form."""

step_name: str = "meadow"

short_name: str
namespace: str
version: str
Expand All @@ -52,7 +54,7 @@ class MeadowForm(utils.StepForm):
def __init__(self: Self, **data: str | bool) -> None:
"""Construct class."""
data["add_to_dag"] = data["dag_file"] != utils.ADD_DAG_OPTIONS[0]
super().__init__(**data, step_name="meadow")
super().__init__(**data)

def validate(self: "MeadowForm") -> None:
"""Check that fields in form are valid.
Expand Down Expand Up @@ -153,7 +155,7 @@ def update_state() -> None:
st.toggle,
label="Generate playground notebook",
key="generate_notebook",
default_last=True,
default_last=False,
)
# Private?
APP_STATE.st_widget(
Expand Down Expand Up @@ -231,11 +233,13 @@ def update_state() -> None:
# Preview generated
st.subheader("Generated files")
utils.preview_file(step_path, "python")
utils.preview_dag_additions(dag_content, dag_path)
utils.preview_dag_additions(dag_content=dag_content, dag_path=dag_path)

# User message
st.toast("Templates generated. Read the next steps.", icon="✅")

# Update config
utils.update_wizard_config(form=form)
else:
st.write(form.errors)
st.error("Form not submitted! Check errors!")
21 changes: 16 additions & 5 deletions apps/wizard/templating/snapshot.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@
# Get properties for origin in schema
schema_origin = SNAPSHOT_SCHEMA["properties"]["meta"]["properties"]["origin"]["properties"]
# Lists with fields of special types. By default, fields are text inputs.
FIELD_TYPES_TEXTAREA = ["origin.dataset_description_owid", "origin.dataset_description_producer"]
FIELD_TYPES_TEXTAREA = [
"origin.dataset_description_owid",
"origin.dataset_description_producer",
"origin.citation_producer",
]
FIELD_TYPES_SELECT = ["origin.license.name"]
# Get current directory
CURRENT_DIR = Path(__file__).parent
Expand All @@ -44,6 +48,8 @@
class SnapshotForm(utils.StepForm):
"""Interface for snapshot form."""

step_name: str = "snapshot"

# config
namespace: str
snapshot_version: str
Expand Down Expand Up @@ -114,6 +120,10 @@ def validate(self: "SnapshotForm") -> None:
@property
def metadata(self: Self) -> Dict[str, Any]:
"""Define metadata for easy YAML-export."""
license_field = {
"name": self.license_name,
"url": self.license_url,
}
meta = {
"meta": {
"origin": {
Expand All @@ -130,11 +140,9 @@ def metadata(self: Self) -> Dict[str, Any]:
"dataset_url_download": self.dataset_url_download,
"date_published": self.date_published,
"date_accessed": self.date_accessed,
"license": {
"name": self.license_name,
"url": self.license_url,
},
"license": license_field,
},
"license": license_field,
"is_public": not self.is_private,
}
}
Expand Down Expand Up @@ -592,6 +600,9 @@ def update_state() -> None:

# User message
st.toast("Templates generated. Read the next steps.", icon="✅")

# Update config
utils.update_wizard_config(form=form)
else:
st.write(form.errors)
st.error("Form not submitted! Check errors!")
104 changes: 83 additions & 21 deletions apps/wizard/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@
import re
import shutil
import tempfile
from io import StringIO
from pathlib import Path
from typing import Any, Callable, Dict, List, Literal, Optional, Type, cast

import jsonref
import jsonschema
import ruamel.yaml
import streamlit as st
import yaml
from cookiecutter.main import cookiecutter
from MySQLdb import OperationalError
from owid import walden
Expand All @@ -29,6 +29,7 @@
from etl.files import apply_black_formatter_to_files
from etl.paths import (
APPS_DIR,
BASE_DIR,
DAG_DIR,
LATEST_POPULATION_VERSION,
LATEST_REGIONS_VERSION,
Expand Down Expand Up @@ -72,6 +73,9 @@
MD_GARDEN = APPS_DIR / "wizard" / "templating" / "markdown" / "garden.md"
MD_GRAPHER = APPS_DIR / "wizard" / "templating" / "markdown" / "grapher.md"

# PATH WIZARD CONFIG
WIZARD_CONFIG = BASE_DIR / ".wizard"


if WALKTHROUGH_ORIGINS:
DUMMY_DATA = {
Expand Down Expand Up @@ -107,9 +111,6 @@
"url": "https://www.url-dummy.com/",
}

# state shared between steps
APP_STATE = {}


def validate_short_name(short_name: str) -> Optional[str]:
"""Validate short name."""
Expand All @@ -128,11 +129,15 @@ def add_to_dag(dag: DAG, dag_path: Path = DAG_WALKTHROUGH_PATH) -> str:
doc["steps"].update(dag)

with open(dag_path, "w") as f:
# Add new step to DAG
yml = ruamel.yaml.YAML()
yml.indent(mapping=2, sequence=4, offset=2)
yml.dump(doc, f)

return yaml.dump({"steps": dag})
yml.dump(doc, stream=f)
# Get subdag as string
with StringIO() as string_stream:
yml.dump({"steps": dag}, stream=string_stream)
output_str = string_stream.getvalue()
return output_str


def remove_from_dag(step: str, dag_path: Path = DAG_WALKTHROUGH_PATH) -> None:
Expand All @@ -142,7 +147,10 @@ def remove_from_dag(step: str, dag_path: Path = DAG_WALKTHROUGH_PATH) -> None:
doc["steps"].pop(step, None)

with open(dag_path, "w") as f:
ruamel.yaml.dump(doc, f, Dumper=ruamel.yaml.RoundTripDumper)
# Add new step to DAG
yml = ruamel.yaml.YAML()
yml.indent(mapping=2, sequence=4, offset=2)
yml.dump(doc, f)


def generate_step(cookiecutter_path: Path, data: Dict[str, Any], target_dir: Path) -> None:
Expand Down Expand Up @@ -204,16 +212,25 @@ def _init_steps(self: "AppState") -> None:
for step in self.steps:
if step not in st.session_state["steps"]:
st.session_state["steps"][step] = {}
# Initiate default
self.default_steps = {step: {} for step in self.steps}

# Load config from .wizard
config = load_wizard_config()
# Add defaults (these are used when not value is found in current or previous step)
self.default_steps["snapshot"]["snapshot_version"] = DATE_TODAY
self.default_steps["snapshot"]["origin.date_accessed"] = DATE_TODAY

self.default_steps["meadow"]["version"] = DATE_TODAY
self.default_steps["meadow"]["snapshot_version"] = DATE_TODAY
self.default_steps["meadow"]["generate_notebook"] = config["template"]["meadow"]["generate_notebook"]

self.default_steps["garden"]["version"] = DATE_TODAY
self.default_steps["garden"]["meadow_version"] = DATE_TODAY
self.default_steps["garden"]["generate_notebook"] = config["template"]["garden"]["generate_notebook"]

# Add defaults
st.session_state["steps"]["snapshot"]["snapshot_version"] = DATE_TODAY
st.session_state["steps"]["snapshot"]["origin.date_accessed"] = DATE_TODAY
st.session_state["steps"]["meadow"]["version"] = DATE_TODAY
st.session_state["steps"]["meadow"]["snapshot_version"] = DATE_TODAY
st.session_state["steps"]["garden"]["version"] = DATE_TODAY
st.session_state["steps"]["garden"]["meadow_version"] = DATE_TODAY
st.session_state["steps"]["grapher"]["version"] = DATE_TODAY
st.session_state["steps"]["grapher"]["garden_version"] = DATE_TODAY
self.default_steps["grapher"]["version"] = DATE_TODAY
self.default_steps["grapher"]["garden_version"] = DATE_TODAY

def _check_step(self: "AppState") -> None:
"""Check that the value for step is valid."""
Expand Down Expand Up @@ -249,8 +266,8 @@ def state_step(self: "AppState") -> Dict[str, Any]:
return st.session_state["steps"][self.step]

def default_value(
self: "AppState", key: str, previous_step: Optional[str] = None, default_last: Optional[Any] = ""
) -> str:
self: "AppState", key: str, previous_step: Optional[str] = None, default_last: Optional[str | bool | int] = ""
) -> str | bool | int:
"""Get the default value of a variable.

This is useful when setting good defaults in widgets (e.g. text_input).
Expand All @@ -265,12 +282,28 @@ def default_value(
if previous_step is None:
previous_step = self.previous_step
# (1) Get value stored for this field (in current step)
# st.write(f"KEY: {key}")
value_step = self.state_step.get(key)
if value_step:
# st.write(f"value_step: {value_step}")
if value_step is not None:
return value_step
# (2) If none, check if previous step has a value and use that one, otherwise (3) use empty string.
key = key.replace(f"{self.step}.", f"{self.previous_step}.")
return st.session_state["steps"][self.previous_step].get(key, default_last)
value_previous_step = st.session_state["steps"][self.previous_step].get(key)
# st.write(f"value_previous_step: {value_previous_step}")
if value_previous_step is not None:
return value_previous_step
# (3) If none, use self.default_steps
value_defaults = self.default_steps[self.step].get(key)
# st.write(f"value_defaults: {value_defaults}")
if value_defaults is not None:
return value_defaults
# (4) Use default_last as last resource
if default_last is None:
raise ValueError(
f"No value found for {key} in current, previous or defaults. Must provide a valid `default_value`!"
)
return cast(str | bool | int, default_last)

def display_error(self: "AppState", key: str) -> None:
"""Get error message for a given key."""
Expand Down Expand Up @@ -335,6 +368,7 @@ class StepForm(BaseModel):
"""Form abstract class."""

errors: Dict[str, Any] = {}
step_name: str

def __init__(self: Self, **kwargs: str | int) -> None:
"""Construct parent class."""
Expand Down Expand Up @@ -544,3 +578,31 @@ def warning_notion_latest() -> None:
st.warning(
"Documentation for new metadata is almost complete, but still being finalised. For latest definitions refer to [Notion](https://www.notion.so/owid/Metadata-guidelines-29ca6e19b6f1409ea6826a88dbb18bcc)."
)


def load_wizard_config() -> Dict[str, Any]:
"""Load default wizard config."""
if os.path.exists(WIZARD_CONFIG):
with open(WIZARD_CONFIG, "r") as f:
return json.load(f)
return {
"template": {
"meadow": {"generate_notebook": False},
"garden": {"generate_notebook": False},
}
}


def update_wizard_config(form: StepForm) -> None:
"""Update wizard config file."""
# Load config
config = load_wizard_config()

# Update config
if form.step_name in ["meadow", "garden"]:
form_dix = form.dict()
config["template"][form.step_name]["generate_notebook"] = form_dix.get("generate_notebook", False)

# Export config
with open(WIZARD_CONFIG, "w") as f:
json.dump(config, f)
5 changes: 3 additions & 2 deletions etl/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -987,12 +987,13 @@ def isolated_env(
sys.path.remove(working_dir.as_posix())


def read_json_schema(path: Union[Path, str]) -> dict:
def read_json_schema(path: Union[Path, str]) -> Dict[str, Any]:
"""Read JSON schema with resolved references."""
path = Path(path)

# pathlib does not append trailing slashes, but jsonref needs that.
base_dir_url = path.parent.absolute().as_uri() + "/"
base_file_url = urljoin(base_dir_url, path.name)
with path.open("r") as f:
return jsonref.loads(f.read(), base_uri=base_file_url, lazy_load=False)
dix = jsonref.loads(f.read(), base_uri=base_file_url, lazy_load=False)
return cast(Dict[str, Any], dix)