Skip to content

Commit

Permalink
Follow-up on the run_metadata changes (#3193)
Browse files Browse the repository at this point in the history
* Initial commit, nuking all metadata responses and seeing what breaks

* Removed last remnant of LazyLoader

* Reintroducing the lazy loaders.

* Add LazyRunMetadataResponse to EntrypointFunctionDefinition

* Test for lazy loaders works now

* Fixed tests, reformatted

* Use updated template

* Auto-update of Starter template

* Updated more templates

* Fixed failing test

* Fixed step run schemas

* Auto-update of E2E template

* Auto-update of NLP template

* Fixed tests, removed additional .value access

* Further fixing

* Fixed linting issues

* Reformatted

* Linted, formatted and tested again

* Typing

* Maybe fix everything

* Apply some feedback

* new operation

* new log_metadata function

* changes to the base filters

* new filters

* adding log_metadata to __all__

* checkpoint with float casting

* adding tests

* final touches and formatting

* formatting

* moved the utils

* modified log metadata function

* checkpoint

* deprecating the old functions

* linting and final fixes

* better error message

* fixing the client method

* better error message

* consistent creation\

* adjusting tests

* linting

* changes for step metadata

* more test adjustments

* testing unit tests

* linting

* fixing more tests

* fixing more tests

* more test fixes

* fixing the test

* fixing per comments

* added validation, constant error message

* linting

* new changes

* second checkpoint

* fixing revisions

* adding overlap to remove warnings

* complete docs changes

* adding a parameter to control the related entity behaviour

* fixing the toc

* fixed the description

* docstring

* spellcheck

* metadata creation during artifact version creation

* allowing artifact metadata with name for external artifact

* update the template versions

* Auto-update of LLM Finetuning template

* Auto-update of Starter template

* Auto-update of E2E template

* Auto-update of NLP template

* fixing the migration script

* formatting

* redirects

* minor fixes

* working pipelines again

* small fix

* working checkpoint

* fixes, linting, docstrings

* fixing unit tests

* docs updates 1

* docs update 2

* fixing integration tests

* spellcheck

* formatting

* Auto-update of E2E template

* docs changes

* review comments

* added the batch rbac call

* added a validator to check the name of the keys

* small adjustments

* base schema added

* formatting

* new functionalities

* breaking circular imports

* spellchecker

* other minor fixes

* covering the uncovered case

* adjusting tests

* fixing the quickstart again

* minor change

* going back to publisher step id

* updating github refs

* Auto-update of LLM Finetuning template

* Auto-update of Starter template

* fixing tests

* updated docs

* Auto-update of E2E template

* Auto-update of NLP template

* formatting

* review comments

* adding some tests in

* review comments

* Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

Co-authored-by: Michael Schuster <[email protected]>

* Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

Co-authored-by: Michael Schuster <[email protected]>

* Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

Co-authored-by: Michael Schuster <[email protected]>

* Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

Co-authored-by: Michael Schuster <[email protected]>

* Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

Co-authored-by: Michael Schuster <[email protected]>

* changed assert to value error

* fixed the alembic head

* changed the interaction with the models

* trimmed down

* small bugfix

* naming recommendations

* linting

* fixing the test

---------

Co-authored-by: AlexejPenner <[email protected]>
Co-authored-by: Andrei Vishniakov <[email protected]>
Co-authored-by: GitHub Actions <[email protected]>
Co-authored-by: Michael Schuster <[email protected]>
Co-authored-by: Michael Schuster <[email protected]>
  • Loading branch information
6 people authored Nov 29, 2024
1 parent 0ccb1fd commit fbbfc29
Show file tree
Hide file tree
Showing 57 changed files with 1,482 additions and 566 deletions.
1 change: 1 addition & 0 deletions .gitbook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ redirects:
how-to/setting-up-a-project-repository/best-practices: how-to/project-setup-and-management/setting-up-a-project-repository/set-up-repository.md
getting-started/zenml-pro/system-architectures: getting-started/system-architectures.md
how-to/build-pipelines/name-your-pipeline-and-runs: how-to/pipeline-development/build-pipelines/name-your-pipeline-runs.md
how-to/model-management-metrics/track-metrics-metadata/attach-metadata-to-steps: how-to/model-management-metrics/track-metrics-metadata/attach-metadata-to-a-step.md

# ZenML Pro
getting-started/zenml-pro/user-management: getting-started/zenml-pro/core-concepts.md
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/update-templates-to-examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.11.20 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -118,7 +118,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.10.30 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -189,7 +189,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.10.30 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -261,7 +261,7 @@ jobs:
with:
python-version: ${{ inputs.python-version }}
ref-zenml: ${{ github.ref }}
ref-template: 2024.11.08 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,44 @@ description: Tracking metrics and metadata

# Track metrics and metadata

Logging metrics and metadata is standardized in ZenML. The most common pattern is to use the `log_xxx` methods, e.g.:
ZenML provides a unified way to log and manage metrics and metadata through
the `log_metadata` function. This versatile function allows you to log
metadata across various entities like models, artifacts, steps, and runs
through a single interface. Additionally, you can adjust if you want to
automatically the same metadata for the related entities.

* Log metadata to a [model](attach-metadata-to-a-model.md): `log_model_metadata`
* Log metadata to an [artifact](attach-metadata-to-an-artifact.md): `log_artifact_metadata`
* Log metadata to a [step](attach-metadata-to-steps.md): `log_step_metadata`
### The most basic use-case

You can use the `log_metadata` function within a step:

```python
from zenml import step, log_metadata

@step
def my_step() -> ...:
log_metadata(metadata={"accuracy": 0.91})
...
```

This will log the `accuracy` for the step, its pipeline run, and if provided
its model version.

### Additional use-cases

The `log_metadata` function also supports various use-cases by allowing you to
specify the target entity (e.g., model, artifact, step, or run) with flexible
parameters. You can learn more about these use-cases in the following pages:

- [Log metadata to a step](attach-metadata-to-a-step.md)
- [Log metadata to a run](attach-metadata-to-a-run.md)
- [Log metadata to an artifact](attach-metadata-to-an-artifact.md)
- [Log metadata to a model](attach-metadata-to-a-model.md)

{% hint style="warning" %}
The older methods for logging metadata to specific entities, such as
`log_model_metadata`, `log_artifact_metadata`, and `log_step_metadata`, are
now deprecated. It is recommended to use `log_metadata` for all future
implementations.
{% endhint %}

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Original file line number Diff line number Diff line change
@@ -1,62 +1,93 @@
---
description: >-
Attach any metadata as key-value pairs to your models for future reference and
auditability.
description: Learn how to attach metadata to a model.
---

# Attach metadata to a model

ZenML allows you to log metadata for models, which provides additional context
that goes beyond individual artifact details. Model metadata can represent
high-level insights, such as evaluation results, deployment information,
or customer-specific details, making it easier to manage and interpret
the model's usage and performance across different versions.

## Logging Metadata for Models

While artifact metadata is specific to individual outputs of steps, model metadata encapsulates broader and more general information that spans across multiple artifacts. For example, evaluation results or the name of a customer for whom the model is intended could be logged with the model.
To log metadata for a model, use the `log_metadata` function. This function
lets you attach key-value metadata to a model, which can include metrics and
other JSON-serializable values, such as custom ZenML types like `Uri`,
`Path`, and `StorageSize`.

Here's an example of logging metadata for a model:

```python
from zenml import step, log_model_metadata, ArtifactConfig, get_step_context
from typing import Annotated

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import ClassifierMixin
from sklearn.ensemble import RandomForestClassifier

from zenml import step, log_metadata, ArtifactConfig, get_step_context


@step
def train_model(dataset: pd.DataFrame) -> Annotated[ClassifierMixin, ArtifactConfig(name="sklearn_classifier")]:
"""Train a model"""
# Fit the model and compute metrics
def train_model(dataset: pd.DataFrame) -> Annotated[
ClassifierMixin, ArtifactConfig(name="sklearn_classifier")
]:
"""Train a model and log model metadata."""
classifier = RandomForestClassifier().fit(dataset)
accuracy, precision, recall = ...

# Log metadata for the model
# This associates the metadata with the ZenML model, not the artifact
log_model_metadata(

log_metadata(
metadata={
"evaluation_metrics": {
"accuracy": accuracy,
"precision": precision,
"recall": recall
}
},
# Omitted model_name will use the model in the current context
model_name="zenml_model_name",
# Omitted model_version will default to 'latest'
model_version="zenml_model_version",
infer_model=True,
)

return classifier
```

In this example, the metadata is associated with the model rather than the specific classifier artifact. This is particularly useful when the metadata reflects an aggregation or summary of various steps and artifacts in the pipeline.
In this example, the metadata is associated with the model rather than the
specific classifier artifact. This is particularly useful when the metadata
reflects an aggregation or summary of various steps and artifacts in the
pipeline.


### Selecting Models with `log_metadata`

When using `log_metadata`, ZenML provides flexible options of attaching
metadata to model versions:

1. **Using `infer_model`**: If used within a step, ZenML will use the step
context to infer the model it is using and attach the metadata to it.
2. **Model Name and Version Provided**: If both a model name and version are
provided, ZenML will use these to identify and attach metadata to the
specific model version.
3. **Model Version ID Provided**: If a model version ID is directly provided,
ZenML will use it to fetch and attach the metadata to that specific model
version.

## Fetching logged metadata

Once metadata has been logged in an [artifact](attach-metadata-to-an-artifact.md), model, or [step](attach-metadata-to-steps.md), we can easily fetch the metadata with the ZenML Client:
Once metadata has been attached to a model, it can be retrieved for inspection
or analysis using the ZenML Client.

```python
from zenml.client import Client

client = Client()
model = client.get_model_version("my_model", "my_version")

print(model.run_metadata["metadata_key"].value)
print(model.run_metadata["metadata_key"])
```

{% hint style="info" %}
When you are fetching metadata using a specific key, the returned value will
always reflect the latest entry.
{% endhint %}

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
description: Learn how to attach metadata to a run.
---

# Attach Metadata to a Run

In ZenML, you can log metadata directly to a pipeline run, either during or
after execution, using the `log_metadata` function. This function allows you
to attach a dictionary of key-value pairs as metadata to a pipeline run,
with values that can be any JSON-serializable data type, including ZenML
custom types like `Uri`, `Path`, `DType`, and `StorageSize`.

## Logging Metadata Within a Run

If you are logging metadata from within a step that’s part of a pipeline run,
calling `log_metadata` will attach the specified metadata to the current
pipeline run where the metadata key will have the `step_name::metadata_key`
pattern. This allows you to use the same metadata key from different steps
while the run's still executing.

```python
from typing import Annotated

import pandas as pd
from sklearn.base import ClassifierMixin
from sklearn.ensemble import RandomForestClassifier

from zenml import step, log_metadata, ArtifactConfig


@step
def train_model(dataset: pd.DataFrame) -> Annotated[
ClassifierMixin,
ArtifactConfig(name="sklearn_classifier", is_model_artifact=True)
]:
"""Train a model and log run-level metadata."""
classifier = RandomForestClassifier().fit(dataset)
accuracy, precision, recall = ...

# Log metadata at the run level
log_metadata(
metadata={
"run_metrics": {
"accuracy": accuracy,
"precision": precision,
"recall": recall
}
}
)
return classifier
```

## Manually Logging Metadata to a Pipeline Run

You can also attach metadata to a specific pipeline run without needing a step,
using identifiers like the run ID. This is useful when logging information or
metrics that were calculated post-execution.

```python
from zenml import log_metadata

log_metadata(
metadata={"post_run_info": {"some_metric": 5.0}},
run_id_name_or_prefix="run_id_name_or_prefix"
)
```

## Fetching Logged Metadata

Once metadata has been logged in a pipeline run, you can retrieve it using
the ZenML Client:

```python
from zenml.client import Client

client = Client()
run = client.get_pipeline_run("run_id_name_or_prefix")

print(run.run_metadata["metadata_key"])
```

{% hint style="info" %}
When you are fetching metadata using a specific key, the returned value will
always reflect the latest entry.
{% endhint %}

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Loading

0 comments on commit fbbfc29

Please sign in to comment.