-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config obj replace schemas #717
base: main
Are you sure you want to change the base?
Conversation
…ig throughout src
Moved config.py to 'src/calliope/schemas'. config.py is too generic, and conflicts with config variable names.
* Removed mode redundancy to simplify the configuration * Simplify config schema extraction
@brynpickering before continuing on this, we need to ensure we have a common vision on where / how stuff is checked, and the pros/cons of certain approaches. Particularly for places were our setup conflicts with A key assumption, based on previous discussions, is that we are attempting to remove all schema The final object would look like this, sort of: class CalliopeModel(BaseModel):
"""Calliope model class."""
model_config = {"title": "calliope model"}
config: CalliopeConfig
techs: dict[str, Tech]
nodes: dict[str, Node]
data_tables: dict[str, DataTable] | None = None
parameters: Parameter | None = None # this might go in Math later, here for now
math: Math | None = None # inserted by us using build.mode???
overrides: None | dict[str, Self] = None Open question 1: The role of
|
We can also keep all of the elements in |
Thanks @irm-codebase - this is what I had in mind w.r.t. a move to pydantic! One thing I think it will offer us is the ability to remove AttrDict entirely 👀... We still need to be able to load our flavour of YAML, but that could be something we do with less maintenance overhead. RE techs, we have a few weird parameters, but keep in mind that none of them are really "hard" requirements in terms of YAML definitions. You can define all that info in data tables and then have almost nothing defined in YAML. At that point pydantic's magic is useless to us anyway, hence why I wouldn't put too much focus on maximising the value of pydantic; we'll only ever apply it to a subset of the data when data tables are used. Data validation should wait a bit longer and be applied once YAML and data table info has been merged (i.e. on the xarray arrays). This means that no, parameter settings should not be in the pydantic model (as they shouldn't be in a YAML schema). Instead, they should be in a YAML definition (e.g. as given in #712) whose structure is described by a pydantic model. Still, we have to define some things in pydantic under tech defs since they are weird parameters structures that we let through (namely, RE the config options we include. We should consider how best to handle overrides and templates - should they be validated separately or should the pydantic model only be generated based on the resolved dict (all overrides and template data applied)? |
@brynpickering some replies (it's a bit difficult to follow which section you are referring to, but I'll try my best here).
Understood. One important caveat of this is that we cannot easily offer some features that users might want (i.e., setting max/min ranges for a numeric parameter) without having to write this code ourselves. This is fine as long as it is understood. If we ever wish to, we could re-arrange our parsing to happen after templates and data-tables are loaded to the dictionary (file -> our yaml loading -> complex pydantic -> backend). In this setup we would be able to rely on For now, I will assume we'll go with the simplistic
I think we pretty much agree on how this should be done, then.
This is the hard one. To answer the question(?) at the end: yes, but it depends. This comes back to my point on our current flat structure being troublesome. Checking certain things gets really hard when you have them all in a "salad" dictionary, even more so one can be extended by users. As far as I can tell, this is the problem: class CalliopeModel(BaseModel):
"""Calliope model class."""
model_config = {"title": "calliope model"}
config: CalliopeConfig
techs: dict[str, Tech]
...
class TechFlat(BaseModel):
model_config = {
"extra": "allow"
}
# ANY other value will pass without checks! If we do not define it, we do not check it!
# User 'extras' are a superset of what we could check, meaning that we cannot check non-specified items
# We essentially do not check the **structure** of anything beyond hard-coded stuff (e.g., name, carrier_in)
name: str
carrier_in: str | None = None
carrier_out: str | None = None
carrier_export: str | None = None
class TechDivided(BaseModel):
model_config = {
"extra": "forbid"
}
# May have anything as long as it's single values, to keep our IO simple
extra: dict[str, float | str] | None = None
# Structure is checked!
params: dict[str, float]
# Structure is checked!
lookups: dict[str, str] I am struggling a bit on how to properly setup a flat structure with passover values that does not end up feeling pointless due to the superset nature of user extras. Basically, any value can be there, even values that would not be used by some combination of our settings, with the extra ambiguity of not knowing if it is a user 'extra' is accidentally named as something important.
The answer depends on what we want to do.
Both have pros / cons. I think the second is the easiest in our current setup. I'd say second... solved by our yaml reading function? |
That would be dealt with by the parameter config being introduced in #712 . We can have min/max or a more general way to define validation rules that will act on the xarray dataset (using the format defined in https://github.com/calliope-project/calliope/blob/959fea7096ce47dfaf6886be1e1fa9445b92bc69/src/calliope/config/model_data_checks.yaml). This is what I mean about data validation that is beyond the scope of pydantic as it is no longer working on a dictionary of data, but arrays in an xarray dataset.
I don't think it's the flat structure that causes the problem so much as allowing for user-defined keys. One option would be to consider dynamic fields. For instance, you can set type annotations for extra fields such that they have to at least follow a specific structure (as we define with our patternproperties currently). So the flat structure for a tech could be:
|
To update on the idea of using DATA_T = bool | int | float | str
class ConfigBaseModel(BaseModel):
"""A base class for creating pydantic models for Calliope configuration options."""
_kwargs: dict = {}
def update(self, update_dict: dict, deep: bool = False) -> Self:
"""Return a new iteration of the model with updated fields.
Updates are validated and stored in the parent class in the `_kwargs` key.
Args:
update_dict (dict): Dictionary with which to update the base model.
deep (bool, optional): Set to True to make a deep copy of the model. Defaults to False.
Returns:
BaseModel: New model instance.
"""
new_dict: dict = {}
# Iterate through dict to be updated and convert any sub-dicts into their respective pydantic model objects
for key, val in update_dict.items():
key_class = getattr(self, key)
if isinstance(key_class, ConfigBaseModel):
new_dict[key] = key_class._update(val)
else:
new_dict[key] = val
updated = self.model_copy(update=new_dict, deep=deep)
updated = self.model_validate(updated.model_dump())
self._kwargs = update_dict
return updated
def _update(self, val):
updated = self.update(val)
self._kwargs = val
return updated
class Param(ConfigBaseModel):
"""Uniform dictionairy for parameters."""
data: DATA_T | list[bool | int | float | str]
index: list[list[str]] = [[]]
dims: list[str] = []
def _update(self, val):
if isinstance(val, dict):
updated = self.update(val)
self._kwargs = val
return updated
else:
return val
class Tech(ConfigBaseModel):
model_config = {
"extra": "allow"
}
__pydantic_extra__: dict[str, Param | DATA_T] = Field(union_mode='left_to_right')
base_tech: Literal["supply"] # etc.
carrier_in: str | list[str] | None = None
carrier_out: str | list[str] | None = None
carrier_export: str | list[str] | None = None
class Techs(ConfigBaseModel):
model_config = {
"extra": "allow"
}
__pydantic_extra__: dict[str, Tech]
class Model(ConfigBaseModel):
techs: Techs Which would work to manage a dict like: {"techs": {"foo": {"bar": {"data": 1}, "baz": "foobar", "base_tech": "supply", "carrier_in": ["elec", "heat"]}}} And would give you appropriate dot notation afterwards (e.g. |
@irm-codebase I've edited my previous comment based on playing around with the |
@brynpickering thanks for the update! Is forcing subdicts to update to their pydantic model necessary, though? I thought that was done automatically (or it seemed to be during my tests). |
I found that it just generated subdicts in my updated pydantic model, not nested pydantic models. If there's a way to do it without the extra lines I've added, then that's great! |
@brynpickering seems like this can be achieved with I'll see if I can get this to work on my updates. |
@brynpickering to keep things simple, I will change focus to merge this into main after PR #704 is complete. |
Compliments PR #704 by extending
pydantic
validation to other sections of model.yaml
filesSummary of changes in this pull request
So far
data_tables
schema to ensure user configurations fit our setupReviewer checklist