Skip to content

Commit

Permalink
Merge pull request #204 from antoinejeannot/upgrade-pydantic-v2
Browse files Browse the repository at this point in the history
bump pydantic
  • Loading branch information
antoinejeannot authored Dec 4, 2023
2 parents 62eb031 + 9857df8 commit 5d488fc
Show file tree
Hide file tree
Showing 29 changed files with 411 additions and 306 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ repos:
tests/.*
)$
additional_dependencies: [
pydantic==1.*,
pydantic==2.*,
types-python-dateutil,
types-requests,
types-urllib3,
Expand Down
21 changes: 17 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<p align="center">
<em>Python framework for production ML systems.</em>
</p>

---

<p align="center">
Expand All @@ -21,7 +21,7 @@
<a href="https://pepy.tech/project/modelkit"><img src="https://pepy.tech/badge/modelkit" /></a>
</p>

`modelkit` is a minimalist yet powerful MLOps library for Python, built for people who want to deploy ML models to production.
`modelkit` is a minimalist yet powerful MLOps library for Python, built for people who want to deploy ML models to production.

It packs several features which make your go-to-production journey a breeze, and ensures that the same exact code will run in production, on your machine, or on data processing pipelines.

Expand Down Expand Up @@ -64,14 +64,27 @@ In addition, you will find that `modelkit` is:

## Installation

Install with `pip`:
Install the latest stable release with `pip`:

```
pip install modelkit
```

Optional dependencies are available for remote storage providers ([see documentation](https://cornerstone-ondemand.github.io/modelkit/assets/storage_provider/#using-different-providers))

### 🚧 Beta release

`modelkit 0.1` and onwards will be shipped with `pydantic 2`, bringing significant performance improvements 🎉 ⚡

To try out the beta before it is stable:

```
pip install --pre modelkit
```

Also, you can refer to the [modelkit migration note](https://cornerstone-ondemand.github.io/modelkit/migration.md)
to ease the migration process!

## Community
Join our [community](https://discord.gg/ayj5wdAArV) on Discord to get support and leave feedback

Expand All @@ -83,6 +96,6 @@ Contributors, if you want to install and test locally:
# install
make setup
# lint & test
# lint & test
make tests
```
64 changes: 64 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Modelkit 0.1 migration note

Modelkit relies on `pydantic` as part of its validation process.

Modelkit 0.1 and onwards will be shipped with `pydantic 2`, which comes with __significant__ performance improvements at the cost of breaking changes.

Details on how to migrate to `pydantic 2` are available in the corresponding migration guide: https://docs.pydantic.dev/latest/migration/

### Installation
To install and try out the `modelkit 0.1.0.bX` beta before its stable release:
```
pip install --pre modelkit
```

### Known breaking changes

Some breaking changes are arising while upgrading to `pydantic 2` and the new `modelkit 0.1 beta`. Here is a brief, rather exhaustive, list of the encountered issues or dropped features.

#### Drop: implicit pydantic model conversion

With `pydantic < 2` and `modelkit < 0.1`, the following pattern was authorized (even though not advised) due to implicit conversions between pydantic models:

```python
import modelkit
import pydantic
import typing

class OutputItem(pydantic.BaseModel):
x: int

class AnotherOutputItem(pydantic.BaseModel):
x: int

class MyModel(modelkit.Model[int, OutputItem]):
def _predict(self, item):
return AnotherOutputItem(x=item)

model = MyModel()
model(1) # raises!

```

__This pattern is no longer allowed__.

However, here are the fixes:
- directly build the right output `pydantic` Model (here: `OutputItem`)
- directly use dicts to benefit from the dict to model conversion from `pydantic` and `modelkit` (or via `.model_dump()`)

### Drop: model validation deactivation

The `MODELKIT_ENABLE_VALIDATION` environment variable (or the `enable_validation` parameter of the `LibrarySettings`) which allowed one to deactivate validation if set to `False` was removed.

This feature has worked for `pydantic < 2` for rather simple `pydantic models` but not complex ones with nested structures (see: https://github.com/Cornerstone-OnDemand/modelkit/pull/8). However, it still is an open question in `pydantic 2`, whether to allow recursive construction of models without validation (see: https://github.com/pydantic/pydantic/issues/8084).
Due to the fact `pydantic 2` brings heavy performance improvements, this feature has not been re-implemented.

Fixes: None, just prepare to have your inputs / outputs validated :)

### Development Workflows

The beta release, along with subsequent patches, will be pushed to the main branch. Prior to the stable release, tags will adopt the format `0.1.0.bX`

For projects that have not migrated, `modelkit 0.0` will continue to receive maintenance on the `v0.0-maintenance` branch. Releases on PyPI and manual tags will adhere to the usual process.

To prevent your project from automatically upgrading to the new modelkit 0.1 upon its stable release, you can enforce an upper bound constraint in your requirements, e.g.: `modelkit<0.1`
2 changes: 1 addition & 1 deletion modelkit/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def __init__(
logger.info("Adding model", name=model_name)
item_type = m._item_type or Any
try:
item_type.schema() # type: ignore
item_type.model_json_schema() # type: ignore
except (ValueError, AttributeError):
logger.info(
"Discarding item type info for model", name=model_name, path=path
Expand Down
15 changes: 10 additions & 5 deletions modelkit/assets/drivers/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,18 @@

import pydantic

from modelkit.core.settings import ModelkitSettings

class StorageDriverSettings(pydantic.BaseSettings):
bucket: str = pydantic.Field(..., env="MODELKIT_STORAGE_BUCKET")
lazy_driver: bool = pydantic.Field(False, env="MODELKIT_LAZY_DRIVER")

class Config:
extra = "allow"
class StorageDriverSettings(ModelkitSettings):
bucket: str = pydantic.Field(
..., validation_alias=pydantic.AliasChoices("bucket", "MODELKIT_STORAGE_BUCKET")
)
lazy_driver: bool = pydantic.Field(
False,
validation_alias=pydantic.AliasChoices("lazy_driver", "MODELKIT_LAZY_DRIVER"),
)
model_config = pydantic.ConfigDict(extra="allow")


class StorageDriver(abc.ABC):
Expand Down
9 changes: 5 additions & 4 deletions modelkit/assets/drivers/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@

class AzureStorageDriverSettings(StorageDriverSettings):
connection_string: Optional[str] = pydantic.Field(
None, env="AZURE_STORAGE_CONNECTION_STRING"
None,
validation_alias=pydantic.AliasChoices(
"connection_string", "AZURE_STORAGE_CONNECTION_STRING"
),
)

class Config:
extra = "forbid"
model_config = pydantic.ConfigDict(extra="forbid")


class AzureStorageDriver(StorageDriver):
Expand Down
9 changes: 5 additions & 4 deletions modelkit/assets/drivers/gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@

class GCSStorageDriverSettings(StorageDriverSettings):
service_account_path: Optional[str] = pydantic.Field(
None, env="GOOGLE_APPLICATION_CREDENTIALS"
None,
validation_alias=pydantic.AliasChoices(
"service_account_path", "GOOGLE_APPLICATION_CREDENTIALS"
),
)

class Config:
extra = "forbid"
model_config = pydantic.ConfigDict(extra="forbid")


class GCSStorageDriver(StorageDriver):
Expand Down
4 changes: 2 additions & 2 deletions modelkit/assets/drivers/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import shutil
from typing import Dict, Optional, Union

import pydantic
from structlog import get_logger

from modelkit.assets import errors
Expand All @@ -12,8 +13,7 @@


class LocalStorageDriverSettings(StorageDriverSettings):
class Config:
extra = "forbid"
model_config = pydantic.ConfigDict(extra="forbid")


class LocalStorageDriver(StorageDriver):
Expand Down
38 changes: 29 additions & 9 deletions modelkit/assets/drivers/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,37 @@


class S3StorageDriverSettings(StorageDriverSettings):
aws_access_key_id: Optional[str] = pydantic.Field(None, env="AWS_ACCESS_KEY_ID")
aws_access_key_id: Optional[str] = pydantic.Field(
None,
validation_alias=pydantic.AliasChoices(
"aws_access_key_id", "AWS_ACCESS_KEY_ID"
),
)
aws_secret_access_key: Optional[str] = pydantic.Field(
None, env="AWS_SECRET_ACCESS_KEY"
None,
validation_alias=pydantic.AliasChoices(
"aws_secret_access_key", "AWS_SECRET_ACCESS_KEY"
),
)
aws_default_region: Optional[str] = pydantic.Field(None, env="AWS_DEFAULT_REGION")
aws_session_token: Optional[str] = pydantic.Field(None, env="AWS_SESSION_TOKEN")
s3_endpoint: Optional[str] = pydantic.Field(None, env="S3_ENDPOINT")
aws_kms_key_id: Optional[str] = pydantic.Field(None, env="AWS_KMS_KEY_ID")

class Config:
extra = "forbid"
aws_default_region: Optional[str] = pydantic.Field(
None,
validation_alias=pydantic.AliasChoices(
"aws_default_region", "AWS_DEFAULT_REGION"
),
)
aws_session_token: Optional[str] = pydantic.Field(
None,
validation_alias=pydantic.AliasChoices(
"aws_session_token", "AWS_SESSION_TOKEN"
),
)
s3_endpoint: Optional[str] = pydantic.Field(
None, validation_alias=pydantic.AliasChoices("s3_endpoint", "S3_ENDPOINT")
)
aws_kms_key_id: Optional[str] = pydantic.Field(
None, validation_alias=pydantic.AliasChoices("aws_kms_key_id", "AWS_KMS_KEY_ID")
)
model_config = pydantic.ConfigDict(extra="forbid")


class S3StorageDriver(StorageDriver):
Expand Down
2 changes: 1 addition & 1 deletion modelkit/core/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ class ModelkitDataValidationException(Exception):
def __init__(
self,
model_identifier: str,
pydantic_exc: Optional[pydantic.error_wrappers.ValidationError] = None,
pydantic_exc: Optional[pydantic.ValidationError] = None,
error_str: str = "Data validation error in model",
):
pydantic_exc_output = ""
Expand Down
4 changes: 2 additions & 2 deletions modelkit/core/library.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ class ConfigurationNotFoundException(Exception):

class AssetInfo(pydantic.BaseModel):
path: str
version: Optional[str]
version: Optional[str] = None


class ModelLibrary:
Expand All @@ -68,7 +68,7 @@ def __init__(
required_models: Optional[Union[List[str], Dict[str, Any]]] = None,
):
"""
Create a prediction service
Create a model library
:param models: a `Model` class, a module, or a list of either in which the
ModelLibrary will look for configurations.
Expand Down
15 changes: 4 additions & 11 deletions modelkit/core/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@
from modelkit.utils.cache import Cache, CacheItem
from modelkit.utils.memory import PerformanceTracker
from modelkit.utils.pretty import describe, pretty_print_type
from modelkit.utils.pydantic import construct_recursive

logger = get_logger(__name__)

Expand Down Expand Up @@ -182,11 +181,8 @@ def _load(self) -> None:


class InternalDataModel(pydantic.BaseModel):
data: Any

class Config:
arbitrary_types_allowed = True
extra = "forbid"
data: Any = None
model_config = pydantic.ConfigDict(arbitrary_types_allowed=True, extra="forbid")


PYDANTIC_ERROR_TRUNCATION = 20
Expand Down Expand Up @@ -392,11 +388,8 @@ def _validate(
):
if model:
try:
if self.service_settings.enable_validation:
return model(data=item).data
else:
return construct_recursive(model, data=item).data
except pydantic.error_wrappers.ValidationError as exc:
return model(data=item).data
except pydantic.ValidationError as exc:
raise exception(
f"{self.__class__.__name__}[{self.configuration_key}]",
pydantic_exc=exc,
Expand Down
20 changes: 13 additions & 7 deletions modelkit/core/model_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,24 @@
from structlog import get_logger

from modelkit.core.model import Asset
from modelkit.core.settings import ModelkitSettings
from modelkit.core.types import LibraryModelsType

logger = get_logger(__name__)


class ModelConfiguration(pydantic.BaseSettings):
class ModelConfiguration(ModelkitSettings):
model_type: Type[Asset]
asset: Optional[str]
asset: Optional[str] = None
model_settings: Optional[Dict[str, Any]] = {}
model_dependencies: Optional[Dict[str, str]]
model_dependencies: Optional[Dict[str, str]] = {}

@pydantic.validator("model_dependencies", always=True, pre=True)
model_config = pydantic.ConfigDict(protected_namespaces=("settings",))

@pydantic.field_validator("model_dependencies", mode="before")
@classmethod
def validate_dependencies(cls, v):
if not v:
if v is None:
return {}
if isinstance(v, (list, set)):
return {key: key for key in v}
Expand Down Expand Up @@ -55,7 +59,7 @@ def walk_objects(mod):
def _configurations_from_objects(m) -> Dict[str, ModelConfiguration]:
if inspect.isclass(m) and issubclass(m, Asset):
return {
key: ModelConfiguration(**{**config, "model_type": m})
key: ModelConfiguration(**config, model_type=m)
for key, config in m.CONFIGURATIONS.items()
}
elif isinstance(m, (list, tuple)):
Expand Down Expand Up @@ -92,7 +96,9 @@ def configure(
if isinstance(conf_value, ModelConfiguration):
conf[key] = conf_value
elif isinstance(conf_value, dict):
conf[key] = ModelConfiguration(**{**conf[key].dict(), **conf_value})
conf[key] = ModelConfiguration(
**{**conf[key].model_dump(), **conf_value}
)
for key in set(configuration.keys()) - set(conf.keys()):
conf_value = configuration[key]
if isinstance(conf_value, ModelConfiguration):
Expand Down
Loading

0 comments on commit 5d488fc

Please sign in to comment.