Merge pull request #204 from antoinejeannot/upgrade-pydantic-v2

bump pydantic
Cornerstone-OnDemand · Dec 4, 2023 · 5d488fc · 5d488fc
2 parents 62eb031 + 9857df8
commit 5d488fc
Show file tree

Hide file tree

Showing 29 changed files with 411 additions and 306 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -31,7 +31,7 @@ repos:
               tests/.*
           )$
         additional_dependencies: [
-          pydantic==1.*,
+          pydantic==2.*,
           types-python-dateutil,
           types-requests,
           types-urllib3,

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 <p align="center">
   <em>Python framework for production ML systems.</em>
 </p>
-    
+
 ---
 
 <p align="center">
@@ -21,7 +21,7 @@
   <a href="https://pepy.tech/project/modelkit"><img src="https://pepy.tech/badge/modelkit" /></a>
 </p>
 
-`modelkit` is a minimalist yet powerful MLOps library for Python, built for people who want to deploy ML models to production. 
+`modelkit` is a minimalist yet powerful MLOps library for Python, built for people who want to deploy ML models to production.
 
 It packs several features which make your go-to-production journey a breeze, and ensures that the same exact code will run in production, on your machine, or on data processing pipelines.
 
@@ -64,14 +64,27 @@ In addition, you will find that `modelkit` is:
 
 ## Installation
 
-Install with `pip`:
+Install the latest stable release with `pip`:
 
 ```
 pip install modelkit
 ```
 
 Optional dependencies are available for remote storage providers ([see documentation](https://cornerstone-ondemand.github.io/modelkit/assets/storage_provider/#using-different-providers))
 
+### 🚧 Beta release
+
+`modelkit 0.1` and onwards will be shipped with `pydantic 2`, bringing significant performance improvements 🎉 ⚡
+
+To try out the beta before it is stable:
+
+```
+pip install --pre modelkit
+```
+
+Also, you can refer to the [modelkit migration note](https://cornerstone-ondemand.github.io/modelkit/migration.md)
+ to ease the migration process!
+
 ## Community
 Join our [community](https://discord.gg/ayj5wdAArV) on Discord to get support and leave feedback
 
@@ -83,6 +96,6 @@ Contributors, if you want to install and test locally:
 # install
 make setup
 
-# lint & test
+# lint & test
 make tests
 ```
diff --git a/docs/migration.md b/docs/migration.md
@@ -0,0 +1,64 @@
+# Modelkit 0.1 migration note
+
+Modelkit relies on `pydantic` as part of its validation process.
+
+Modelkit 0.1 and onwards will be shipped with `pydantic 2`, which comes with __significant__ performance improvements at the cost of breaking changes.
+
+Details on how to migrate to `pydantic 2` are available in the corresponding migration guide: https://docs.pydantic.dev/latest/migration/
+
+### Installation
+To install and try out the `modelkit 0.1.0.bX` beta before its stable release:
+```
+pip install --pre modelkit
+```
+
+### Known breaking changes
+
+Some breaking changes are arising while upgrading to `pydantic 2` and the new `modelkit 0.1 beta`. Here is a brief, rather exhaustive, list of the encountered issues or dropped features.
+
+#### Drop: implicit pydantic model conversion
+
+With `pydantic < 2` and `modelkit < 0.1`, the following pattern was authorized (even though not advised) due to implicit conversions between pydantic models:
+
+```python
+import modelkit
+import pydantic
+import typing
+
+class OutputItem(pydantic.BaseModel):
+    x: int
+
+class AnotherOutputItem(pydantic.BaseModel):
+    x: int
+
+class MyModel(modelkit.Model[int, OutputItem]):
+    def _predict(self, item):
+        return AnotherOutputItem(x=item)
+
+model = MyModel()
+model(1)  # raises!
+
+```
+
+__This pattern is no longer allowed__.
+
+However, here are the fixes:
+- directly build the right output `pydantic` Model (here: `OutputItem`)
+- directly use dicts to benefit from the dict to model conversion from `pydantic` and `modelkit` (or via `.model_dump()`)
+
+### Drop: model validation deactivation
+
+The `MODELKIT_ENABLE_VALIDATION` environment variable (or the `enable_validation` parameter of the `LibrarySettings`) which allowed one to deactivate validation if set to `False` was removed.
+
+This feature has worked for `pydantic < 2` for rather simple `pydantic models` but not complex ones with nested structures (see: https://github.com/Cornerstone-OnDemand/modelkit/pull/8). However, it still is an open question in `pydantic 2`, whether to allow recursive construction of models without validation (see: https://github.com/pydantic/pydantic/issues/8084).
+Due to the fact `pydantic 2` brings heavy performance improvements, this feature has not been re-implemented.
+
+Fixes: None, just prepare to have your inputs / outputs validated :)
+
+### Development Workflows
+
+The beta release, along with subsequent patches, will be pushed to the main branch. Prior to the stable release, tags will adopt the format `0.1.0.bX`
+
+For projects that have not migrated, `modelkit 0.0` will continue to receive maintenance on the `v0.0-maintenance` branch. Releases on PyPI and manual tags will adhere to the usual process.
+
+To prevent your project from automatically upgrading to the new modelkit 0.1 upon its stable release, you can enforce an upper bound constraint in your requirements, e.g.: `modelkit<0.1`
diff --git a/modelkit/api.py b/modelkit/api.py
@@ -98,7 +98,7 @@ def __init__(
             logger.info("Adding model", name=model_name)
             item_type = m._item_type or Any
             try:
-                item_type.schema()  # type: ignore
+                item_type.model_json_schema()  # type: ignore
             except (ValueError, AttributeError):
                 logger.info(
                     "Discarding item type info for model", name=model_name, path=path

diff --git a/modelkit/assets/drivers/abc.py b/modelkit/assets/drivers/abc.py
@@ -3,13 +3,18 @@
 
 import pydantic
 
+from modelkit.core.settings import ModelkitSettings
 
-class StorageDriverSettings(pydantic.BaseSettings):
-    bucket: str = pydantic.Field(..., env="MODELKIT_STORAGE_BUCKET")
-    lazy_driver: bool = pydantic.Field(False, env="MODELKIT_LAZY_DRIVER")
 
-    class Config:
-        extra = "allow"
+class StorageDriverSettings(ModelkitSettings):
+    bucket: str = pydantic.Field(
+        ..., validation_alias=pydantic.AliasChoices("bucket", "MODELKIT_STORAGE_BUCKET")
+    )
+    lazy_driver: bool = pydantic.Field(
+        False,
+        validation_alias=pydantic.AliasChoices("lazy_driver", "MODELKIT_LAZY_DRIVER"),
+    )
+    model_config = pydantic.ConfigDict(extra="allow")
 
 
 class StorageDriver(abc.ABC):

diff --git a/modelkit/assets/drivers/azure.py b/modelkit/assets/drivers/azure.py
@@ -17,11 +17,12 @@
 
 class AzureStorageDriverSettings(StorageDriverSettings):
     connection_string: Optional[str] = pydantic.Field(
-        None, env="AZURE_STORAGE_CONNECTION_STRING"
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "connection_string", "AZURE_STORAGE_CONNECTION_STRING"
+        ),
     )
-
-    class Config:
-        extra = "forbid"
+    model_config = pydantic.ConfigDict(extra="forbid")
 
 
 class AzureStorageDriver(StorageDriver):

diff --git a/modelkit/assets/drivers/gcs.py b/modelkit/assets/drivers/gcs.py
@@ -19,11 +19,12 @@
 
 class GCSStorageDriverSettings(StorageDriverSettings):
     service_account_path: Optional[str] = pydantic.Field(
-        None, env="GOOGLE_APPLICATION_CREDENTIALS"
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "service_account_path", "GOOGLE_APPLICATION_CREDENTIALS"
+        ),
     )
-
-    class Config:
-        extra = "forbid"
+    model_config = pydantic.ConfigDict(extra="forbid")
 
 
 class GCSStorageDriver(StorageDriver):

diff --git a/modelkit/assets/drivers/local.py b/modelkit/assets/drivers/local.py
@@ -3,6 +3,7 @@
 import shutil
 from typing import Dict, Optional, Union
 
+import pydantic
 from structlog import get_logger
 
 from modelkit.assets import errors
@@ -12,8 +13,7 @@
 
 
 class LocalStorageDriverSettings(StorageDriverSettings):
-    class Config:
-        extra = "forbid"
+    model_config = pydantic.ConfigDict(extra="forbid")
 
 
 class LocalStorageDriver(StorageDriver):

diff --git a/modelkit/assets/drivers/s3.py b/modelkit/assets/drivers/s3.py
@@ -17,17 +17,37 @@
 
 
 class S3StorageDriverSettings(StorageDriverSettings):
-    aws_access_key_id: Optional[str] = pydantic.Field(None, env="AWS_ACCESS_KEY_ID")
+    aws_access_key_id: Optional[str] = pydantic.Field(
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "aws_access_key_id", "AWS_ACCESS_KEY_ID"
+        ),
+    )
     aws_secret_access_key: Optional[str] = pydantic.Field(
-        None, env="AWS_SECRET_ACCESS_KEY"
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "aws_secret_access_key", "AWS_SECRET_ACCESS_KEY"
+        ),
     )
-    aws_default_region: Optional[str] = pydantic.Field(None, env="AWS_DEFAULT_REGION")
-    aws_session_token: Optional[str] = pydantic.Field(None, env="AWS_SESSION_TOKEN")
-    s3_endpoint: Optional[str] = pydantic.Field(None, env="S3_ENDPOINT")
-    aws_kms_key_id: Optional[str] = pydantic.Field(None, env="AWS_KMS_KEY_ID")
-
-    class Config:
-        extra = "forbid"
+    aws_default_region: Optional[str] = pydantic.Field(
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "aws_default_region", "AWS_DEFAULT_REGION"
+        ),
+    )
+    aws_session_token: Optional[str] = pydantic.Field(
+        None,
+        validation_alias=pydantic.AliasChoices(
+            "aws_session_token", "AWS_SESSION_TOKEN"
+        ),
+    )
+    s3_endpoint: Optional[str] = pydantic.Field(
+        None, validation_alias=pydantic.AliasChoices("s3_endpoint", "S3_ENDPOINT")
+    )
+    aws_kms_key_id: Optional[str] = pydantic.Field(
+        None, validation_alias=pydantic.AliasChoices("aws_kms_key_id", "AWS_KMS_KEY_ID")
+    )
+    model_config = pydantic.ConfigDict(extra="forbid")
 
 
 class S3StorageDriver(StorageDriver):

diff --git a/modelkit/core/errors.py b/modelkit/core/errors.py
@@ -23,7 +23,7 @@ class ModelkitDataValidationException(Exception):
     def __init__(
         self,
         model_identifier: str,
-        pydantic_exc: Optional[pydantic.error_wrappers.ValidationError] = None,
+        pydantic_exc: Optional[pydantic.ValidationError] = None,
         error_str: str = "Data validation error in model",
     ):
         pydantic_exc_output = ""

diff --git a/modelkit/core/library.py b/modelkit/core/library.py
@@ -53,7 +53,7 @@ class ConfigurationNotFoundException(Exception):
 
 class AssetInfo(pydantic.BaseModel):
     path: str
-    version: Optional[str]
+    version: Optional[str] = None
 
 
 class ModelLibrary:
@@ -68,7 +68,7 @@ def __init__(
         required_models: Optional[Union[List[str], Dict[str, Any]]] = None,
     ):
         """
-        Create a prediction service
+        Create a model library
 
         :param models: a `Model` class, a module, or a list of either in which the
         ModelLibrary will look for configurations.

diff --git a/modelkit/core/model.py b/modelkit/core/model.py
@@ -36,7 +36,6 @@
 from modelkit.utils.cache import Cache, CacheItem
 from modelkit.utils.memory import PerformanceTracker
 from modelkit.utils.pretty import describe, pretty_print_type
-from modelkit.utils.pydantic import construct_recursive
 
 logger = get_logger(__name__)
 
@@ -182,11 +181,8 @@ def _load(self) -> None:
 
 
 class InternalDataModel(pydantic.BaseModel):
-    data: Any
-
-    class Config:
-        arbitrary_types_allowed = True
-        extra = "forbid"
+    data: Any = None
+    model_config = pydantic.ConfigDict(arbitrary_types_allowed=True, extra="forbid")
 
 
 PYDANTIC_ERROR_TRUNCATION = 20
@@ -392,11 +388,8 @@ def _validate(
     ):
         if model:
             try:
-                if self.service_settings.enable_validation:
-                    return model(data=item).data
-                else:
-                    return construct_recursive(model, data=item).data
-            except pydantic.error_wrappers.ValidationError as exc:
+                return model(data=item).data
+            except pydantic.ValidationError as exc:
                 raise exception(
                     f"{self.__class__.__name__}[{self.configuration_key}]",
                     pydantic_exc=exc,

diff --git a/modelkit/core/model_configuration.py b/modelkit/core/model_configuration.py
@@ -10,20 +10,24 @@
 from structlog import get_logger
 
 from modelkit.core.model import Asset
+from modelkit.core.settings import ModelkitSettings
 from modelkit.core.types import LibraryModelsType
 
 logger = get_logger(__name__)
 
 
-class ModelConfiguration(pydantic.BaseSettings):
+class ModelConfiguration(ModelkitSettings):
     model_type: Type[Asset]
-    asset: Optional[str]
+    asset: Optional[str] = None
     model_settings: Optional[Dict[str, Any]] = {}
-    model_dependencies: Optional[Dict[str, str]]
+    model_dependencies: Optional[Dict[str, str]] = {}
 
-    @pydantic.validator("model_dependencies", always=True, pre=True)
+    model_config = pydantic.ConfigDict(protected_namespaces=("settings",))
+
+    @pydantic.field_validator("model_dependencies", mode="before")
+    @classmethod
     def validate_dependencies(cls, v):
-        if not v:
+        if v is None:
             return {}
         if isinstance(v, (list, set)):
             return {key: key for key in v}
@@ -55,7 +59,7 @@ def walk_objects(mod):
 def _configurations_from_objects(m) -> Dict[str, ModelConfiguration]:
     if inspect.isclass(m) and issubclass(m, Asset):
         return {
-            key: ModelConfiguration(**{**config, "model_type": m})
+            key: ModelConfiguration(**config, model_type=m)
             for key, config in m.CONFIGURATIONS.items()
         }
     elif isinstance(m, (list, tuple)):
@@ -92,7 +96,9 @@ def configure(
                 if isinstance(conf_value, ModelConfiguration):
                     conf[key] = conf_value
                 elif isinstance(conf_value, dict):
-                    conf[key] = ModelConfiguration(**{**conf[key].dict(), **conf_value})
+                    conf[key] = ModelConfiguration(
+                        **{**conf[key].model_dump(), **conf_value}
+                    )
         for key in set(configuration.keys()) - set(conf.keys()):
             conf_value = configuration[key]
             if isinstance(conf_value, ModelConfiguration):