-
Notifications
You must be signed in to change notification settings - Fork 97
Add config obj replace schemas #717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ig throughout src
Moved config.py to 'src/calliope/schemas'. config.py is too generic, and conflicts with config variable names.
* Removed mode redundancy to simplify the configuration * Simplify config schema extraction
@brynpickering before continuing on this, we need to ensure we have a common vision on where / how stuff is checked, and the pros/cons of certain approaches. Particularly for places were our setup conflicts with A key assumption, based on previous discussions, is that we are attempting to remove all schema The final object would look like this, sort of: class CalliopeModel(BaseModel):
"""Calliope model class."""
model_config = {"title": "calliope model"}
config: CalliopeConfig
techs: dict[str, Tech]
nodes: dict[str, Node]
data_tables: dict[str, DataTable] | None = None
parameters: Parameter | None = None # this might go in Math later, here for now
math: Math | None = None # inserted by us using build.mode???
overrides: None | dict[str, Self] = None Open question 1: The role of
|
We can also keep all of the elements in |
Thanks @irm-codebase - this is what I had in mind w.r.t. a move to pydantic! One thing I think it will offer us is the ability to remove AttrDict entirely 👀... We still need to be able to load our flavour of YAML, but that could be something we do with less maintenance overhead. RE techs, we have a few weird parameters, but keep in mind that none of them are really "hard" requirements in terms of YAML definitions. You can define all that info in data tables and then have almost nothing defined in YAML. At that point pydantic's magic is useless to us anyway, hence why I wouldn't put too much focus on maximising the value of pydantic; we'll only ever apply it to a subset of the data when data tables are used. Data validation should wait a bit longer and be applied once YAML and data table info has been merged (i.e. on the xarray arrays). This means that no, parameter settings should not be in the pydantic model (as they shouldn't be in a YAML schema). Instead, they should be in a YAML definition (e.g. as given in #712) whose structure is described by a pydantic model. Still, we have to define some things in pydantic under tech defs since they are weird parameters structures that we let through (namely, RE the config options we include. We should consider how best to handle overrides and templates - should they be validated separately or should the pydantic model only be generated based on the resolved dict (all overrides and template data applied)? |
@brynpickering some replies (it's a bit difficult to follow which section you are referring to, but I'll try my best here).
Understood. One important caveat of this is that we cannot easily offer some features that users might want (i.e., setting max/min ranges for a numeric parameter) without having to write this code ourselves. This is fine as long as it is understood. If we ever wish to, we could re-arrange our parsing to happen after templates and data-tables are loaded to the dictionary (file -> our yaml loading -> complex pydantic -> backend). In this setup we would be able to rely on For now, I will assume we'll go with the simplistic
I think we pretty much agree on how this should be done, then.
This is the hard one. To answer the question(?) at the end: yes, but it depends. This comes back to my point on our current flat structure being troublesome. Checking certain things gets really hard when you have them all in a "salad" dictionary, even more so one can be extended by users. As far as I can tell, this is the problem: class CalliopeModel(BaseModel):
"""Calliope model class."""
model_config = {"title": "calliope model"}
config: CalliopeConfig
techs: dict[str, Tech]
...
class TechFlat(BaseModel):
model_config = {
"extra": "allow"
}
# ANY other value will pass without checks! If we do not define it, we do not check it!
# User 'extras' are a superset of what we could check, meaning that we cannot check non-specified items
# We essentially do not check the **structure** of anything beyond hard-coded stuff (e.g., name, carrier_in)
name: str
carrier_in: str | None = None
carrier_out: str | None = None
carrier_export: str | None = None
class TechDivided(BaseModel):
model_config = {
"extra": "forbid"
}
# May have anything as long as it's single values, to keep our IO simple
extra: dict[str, float | str] | None = None
# Structure is checked!
params: dict[str, float]
# Structure is checked!
lookups: dict[str, str] I am struggling a bit on how to properly setup a flat structure with passover values that does not end up feeling pointless due to the superset nature of user extras. Basically, any value can be there, even values that would not be used by some combination of our settings, with the extra ambiguity of not knowing if it is a user 'extra' is accidentally named as something important.
The answer depends on what we want to do.
Both have pros / cons. I think the second is the easiest in our current setup. I'd say second... solved by our yaml reading function? |
That would be dealt with by the parameter config being introduced in #712 . We can have min/max or a more general way to define validation rules that will act on the xarray dataset (using the format defined in https://github.com/calliope-project/calliope/blob/959fea7096ce47dfaf6886be1e1fa9445b92bc69/src/calliope/config/model_data_checks.yaml). This is what I mean about data validation that is beyond the scope of pydantic as it is no longer working on a dictionary of data, but arrays in an xarray dataset.
I don't think it's the flat structure that causes the problem so much as allowing for user-defined keys. One option would be to consider dynamic fields. For instance, you can set type annotations for extra fields such that they have to at least follow a specific structure (as we define with our patternproperties currently). So the flat structure for a tech could be:
|
To update on the idea of using DATA_T = bool | int | float | str
class ConfigBaseModel(BaseModel):
"""A base class for creating pydantic models for Calliope configuration options."""
_kwargs: dict = {}
def update(self, update_dict: dict, deep: bool = False) -> Self:
"""Return a new iteration of the model with updated fields.
Updates are validated and stored in the parent class in the `_kwargs` key.
Args:
update_dict (dict): Dictionary with which to update the base model.
deep (bool, optional): Set to True to make a deep copy of the model. Defaults to False.
Returns:
BaseModel: New model instance.
"""
new_dict: dict = {}
# Iterate through dict to be updated and convert any sub-dicts into their respective pydantic model objects
for key, val in update_dict.items():
key_class = getattr(self, key)
if isinstance(key_class, ConfigBaseModel):
new_dict[key] = key_class._update(val)
else:
new_dict[key] = val
updated = self.model_copy(update=new_dict, deep=deep)
updated = self.model_validate(updated.model_dump())
self._kwargs = update_dict
return updated
def _update(self, val):
updated = self.update(val)
self._kwargs = val
return updated
class Param(ConfigBaseModel):
"""Uniform dictionairy for parameters."""
data: DATA_T | list[bool | int | float | str]
index: list[list[str]] = [[]]
dims: list[str] = []
def _update(self, val):
if isinstance(val, dict):
updated = self.update(val)
self._kwargs = val
return updated
else:
return val
class Tech(ConfigBaseModel):
model_config = {
"extra": "allow"
}
__pydantic_extra__: dict[str, Param | DATA_T] = Field(union_mode='left_to_right')
base_tech: Literal["supply"] # etc.
carrier_in: str | list[str] | None = None
carrier_out: str | list[str] | None = None
carrier_export: str | list[str] | None = None
class Techs(ConfigBaseModel):
model_config = {
"extra": "allow"
}
__pydantic_extra__: dict[str, Tech]
class Model(ConfigBaseModel):
techs: Techs Which would work to manage a dict like: {"techs": {"foo": {"bar": {"data": 1}, "baz": "foobar", "base_tech": "supply", "carrier_in": ["elec", "heat"]}}} And would give you appropriate dot notation afterwards (e.g. |
@irm-codebase I've edited my previous comment based on playing around with the |
@brynpickering thanks for the update! Is forcing subdicts to update to their pydantic model necessary, though? I thought that was done automatically (or it seemed to be during my tests). |
I found that it just generated subdicts in my updated pydantic model, not nested pydantic models. If there's a way to do it without the extra lines I've added, then that's great! |
@brynpickering seems like this can be achieved with I'll see if I can get this to work on my updates. |
@brynpickering to keep things simple, I will change focus to merge this into main after PR #704 is complete. |
Regarding the naming of link connections nodes:
|
@jnnr I've gone with |
@brynpickering I've ran into an 'edge case' triggered by mixing YAML and data table definitions. Initially, I added some extra validation to the I'll unfortunately have to remove these tests (leaving the detection of this stuff to Shouldn't we differentiate between 'structural' model definition (i.e., |
… when using data tables
@brynpickering ready for review! Although this is a lengthy PR, the line difference number that I've updated the first comment of the PR to reflect the changes and omissions of this PR. In general:
I've been as thorough as I can, but this is still a rather critical change. We need to be careful, so any comments / concerns are welcome. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #717 +/- ##
==========================================
+ Coverage 96.22% 96.40% +0.18%
==========================================
Files 30 35 +5
Lines 4181 4397 +216
Branches 591 599 +8
==========================================
+ Hits 4023 4239 +216
Misses 68 68
Partials 90 90
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydanic models look good! Our test suite really does not cleaning. Need to work out a better way to have minimal models to load and check. This is a task for #240 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scanned this and it looks good to me, nothing to add on top of what Bryn noted!
I've added most of your suggestions / fixes @brynpickering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment on the snakecase regex, otherwise good to go from my perspective!
Compliments PR #704 by extending
pydantic
validation to other sections of model.yaml
filesSummary of changes in this pull request
New schemas for the following:
This PR does not:
model_def_schema.yaml
because it has parameter defaults. This needs to be removed in a subsequent PR.preprocess/model_math.py
with the new schema. Once again, the parmeters / dims PR should take care of this.Reviewer checklist