Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pydantic for config serialization/deserialization #438

Merged

Conversation

Paladinium
Copy link

@Paladinium Paladinium commented Dec 4, 2024

There is a lot of complexity in the config classes to deal with default values in the config especially when nested. Even worse, when adding new config classes, extra code might be needed (recent example: AlltalkConfigProxySettings here and here ).

Hence, pydantic was introduced to handle the serialization and deserialization of config data to and from JSON. pydantic is able to deal with partial JSON data (e.g. missing some properties) as well as mapping of fields (like clazz to class). A quick tutorial can be found here.

New unit tests were added to:

  • Validate the correctness of default values
  • Validate the ability to merge partial configurations with default values in code
  • Validate mappings of special fields (like clazz to class)
  • Validate that confignew.json must contain config fields added to code

The most complex part was the use of delegation (e.g. from AlltalkNewEnginesConfig to AlltalkNewEnginesConfigModel).

# default values, we can simply check that both dictionaries are identical:
self.assertEqual(self.config.to_dict(), cfg.to_dict())

def test_no_default_values_missing(self):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if you want this: This forces you to update confignew.json once you add new config properties in code. I personally think this is a good idea, but you might not...

with open(self.config.get_config_path(), "r") as file2:
self.assertEqual(file1.read(), file2.read())

def test_values_merged_with_defaults(self):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test for the complex issue of merging partial config json files with the defaults defined in code. That's what pydantic is actually good for to handle automatically.

def to_dict(self):
return self.__delegate.model_dump(by_alias=True)

# Delegation to the loaded config data
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most complex part dealing with delegation from e.g. AlltalkNewEnginesConfig to AlltalkNewEnginesConfigModel. I took most of the code from here: https://www.fast.ai/posts/2019-08-06-delegation.html

def get_engines_matching(self, condition: Callable[[AlltalkAvailableEngine], bool]):
return [x for x in self.engines_available if condition(x)]

class AlltalkNewEnginesConfig(AbstractJsonConfig, AlltalkNewEnginesConfigFields):
Copy link
Author

@Paladinium Paladinium Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more classes involved now. Let me explain why:

  • AlltalkNewEnginesConfig is the class we already had before. It is what is used in code to access configuration properties. However, the class does not directly contain the properties, but delegates to AlltalkNewEnginesConfigModel.
  • AlltalkNewEnginesConfigModel is a pydantic model (BaseModel), dealing with deserialization and serialization.
  • AlltalkNewEnginesConfigFields contains the actual properties of a config.

Now you might wonder why AlltalkNewEnginesConfig and AlltalkNewEnginesConfigModel both inherit from AlltalkNewEnginesConfigFields. This is because my IDE did not recognize the delegated properties in AlltalkNewEnginesConfig, so no tab completion. The inheritances fixes it, even though properties are actually delegated.


class AlltalkConfigTheme(BaseModel):
# Map 'class' to 'clazz' and vice versa
model_config = ConfigDict(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how pydantic handles mapping of fields, in this case for clazz to class.

@erew123 erew123 merged commit 9c5dc99 into erew123:alltalkbeta Jan 5, 2025
1 check passed
@erew123
Copy link
Owner

erew123 commented Jan 5, 2025

Hi @Paladinium Sorry for being so slow with this! First off Happy New Year and I hope you had a good holiday season however you celebrate it. Im back for 6 days at home, and working my way through anything outstanding (as time allows). Ive managed to give this code a test through, try out a few scenarios and it looks great to me! Really appreciate the time and effort you put into both making this and explaining it all (as well as test code).

I will pull it in now and I will take another shot at the building DeepSpeed one day soon too.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants