-
-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pydantic for config serialization/deserialization #438
Use pydantic for config serialization/deserialization #438
Conversation
# default values, we can simply check that both dictionaries are identical: | ||
self.assertEqual(self.config.to_dict(), cfg.to_dict()) | ||
|
||
def test_no_default_values_missing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if you want this: This forces you to update confignew.json
once you add new config properties in code. I personally think this is a good idea, but you might not...
with open(self.config.get_config_path(), "r") as file2: | ||
self.assertEqual(file1.read(), file2.read()) | ||
|
||
def test_values_merged_with_defaults(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test for the complex issue of merging partial config json files with the defaults defined in code. That's what pydantic is actually good for to handle automatically.
def to_dict(self): | ||
return self.__delegate.model_dump(by_alias=True) | ||
|
||
# Delegation to the loaded config data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most complex part dealing with delegation from e.g. AlltalkNewEnginesConfig
to AlltalkNewEnginesConfigModel
. I took most of the code from here: https://www.fast.ai/posts/2019-08-06-delegation.html
def get_engines_matching(self, condition: Callable[[AlltalkAvailableEngine], bool]): | ||
return [x for x in self.engines_available if condition(x)] | ||
|
||
class AlltalkNewEnginesConfig(AbstractJsonConfig, AlltalkNewEnginesConfigFields): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more classes involved now. Let me explain why:
AlltalkNewEnginesConfig
is the class we already had before. It is what is used in code to access configuration properties. However, the class does not directly contain the properties, but delegates toAlltalkNewEnginesConfigModel
.AlltalkNewEnginesConfigModel
is a pydantic model (BaseModel
), dealing with deserialization and serialization.AlltalkNewEnginesConfigFields
contains the actual properties of a config.
Now you might wonder why AlltalkNewEnginesConfig
and AlltalkNewEnginesConfigModel
both inherit from AlltalkNewEnginesConfigFields
. This is because my IDE did not recognize the delegated properties in AlltalkNewEnginesConfig
, so no tab completion. The inheritances fixes it, even though properties are actually delegated.
|
||
class AlltalkConfigTheme(BaseModel): | ||
# Map 'class' to 'clazz' and vice versa | ||
model_config = ConfigDict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how pydantic handles mapping of fields, in this case for clazz
to class
.
Hi @Paladinium Sorry for being so slow with this! First off Happy New Year and I hope you had a good holiday season however you celebrate it. Im back for 6 days at home, and working my way through anything outstanding (as time allows). Ive managed to give this code a test through, try out a few scenarios and it looks great to me! Really appreciate the time and effort you put into both making this and explaining it all (as well as test code). I will pull it in now and I will take another shot at the building DeepSpeed one day soon too. Thanks again! |
There is a lot of complexity in the config classes to deal with default values in the config especially when nested. Even worse, when adding new config classes, extra code might be needed (recent example:
AlltalkConfigProxySettings
here and here ).Hence, pydantic was introduced to handle the serialization and deserialization of config data to and from JSON. pydantic is able to deal with partial JSON data (e.g. missing some properties) as well as mapping of fields (like
clazz
toclass
). A quick tutorial can be found here.New unit tests were added to:
clazz
toclass
)confignew.json
must contain config fields added to codeThe most complex part was the use of delegation (e.g. from
AlltalkNewEnginesConfig
toAlltalkNewEnginesConfigModel
).