Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up on the run_metadata changes #3193

Merged
merged 162 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from 160 commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
697a93f
Initial commit, nuking all metadata responses and seeing what breaks
AlexejPenner Oct 17, 2024
733a6c8
Removed last remnant of LazyLoader
AlexejPenner Oct 17, 2024
3ce54ef
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 18, 2024
ae71757
Reintroducing the lazy loaders.
AlexejPenner Oct 18, 2024
01b5179
Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…
AlexejPenner Oct 18, 2024
7d0ff82
Add LazyRunMetadataResponse to EntrypointFunctionDefinition
avishniakov Oct 18, 2024
d7a9f83
Test for lazy loaders works now
AlexejPenner Oct 18, 2024
bccb4d2
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 18, 2024
9a0e0b2
Fixed tests, reformatted
AlexejPenner Oct 21, 2024
145b90b
Use updated template
AlexejPenner Oct 21, 2024
1e1991a
Auto-update of Starter template
actions-user Oct 21, 2024
adab934
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 21, 2024
d83628a
Updated more templates
AlexejPenner Oct 21, 2024
6d13071
Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…
AlexejPenner Oct 21, 2024
c4febf3
Fixed failing test
AlexejPenner Oct 21, 2024
5aef8ab
Fixed step run schemas
AlexejPenner Oct 21, 2024
0b66f07
Auto-update of E2E template
actions-user Oct 21, 2024
4b2434a
Auto-update of NLP template
actions-user Oct 21, 2024
8f4af6e
Fixed tests, removed additional .value access
AlexejPenner Oct 21, 2024
cc6902b
Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…
AlexejPenner Oct 21, 2024
edba625
Further fixing
AlexejPenner Oct 21, 2024
7d5cfb7
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 21, 2024
c2b6955
Fixed linting issues
AlexejPenner Oct 21, 2024
e2bd53a
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 21, 2024
a582836
Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…
AlexejPenner Oct 22, 2024
4f82ade
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 22, 2024
58293bb
Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…
AlexejPenner Oct 22, 2024
8f6d305
Reformatted
AlexejPenner Oct 22, 2024
6b18322
Linted, formatted and tested again
AlexejPenner Oct 22, 2024
8b3a1bd
Typing
AlexejPenner Oct 22, 2024
b34f18b
Merge branch 'develop' into feature/better-metadata
AlexejPenner Oct 28, 2024
5cc7b44
Maybe fix everything
schustmi Oct 28, 2024
c368dec
Apply some feedback
schustmi Oct 28, 2024
62e8d6e
merged develop
bcdurak Oct 29, 2024
050f5b5
resolved conflicts
bcdurak Nov 4, 2024
74c1a42
new operation
bcdurak Nov 5, 2024
53dc8e8
new log_metadata function
bcdurak Nov 5, 2024
68a455c
changes to the base filters
bcdurak Nov 5, 2024
4af4165
new filters
bcdurak Nov 6, 2024
fdf8945
adding log_metadata to __all__
bcdurak Nov 6, 2024
39f5bf8
checkpoint with float casting
bcdurak Nov 6, 2024
1c051ec
adding tests
bcdurak Nov 6, 2024
e284808
final touches and formatting
bcdurak Nov 6, 2024
d5bbf72
formatting
bcdurak Nov 6, 2024
3a0d4c8
moved the utils
bcdurak Nov 6, 2024
5b3b217
modified log metadata function
bcdurak Nov 6, 2024
3d5a9f0
checkpoint
bcdurak Nov 6, 2024
e3079a3
deprecating the old functions
bcdurak Nov 6, 2024
c3e69c2
linting and final fixes
bcdurak Nov 6, 2024
2d4c723
better error message
bcdurak Nov 6, 2024
206340c
merged develop
bcdurak Nov 7, 2024
2debd9e
merged develop
bcdurak Nov 8, 2024
7e20409
merged develop
bcdurak Nov 8, 2024
fbd0200
fixing the client method
bcdurak Nov 8, 2024
ec7dc02
better error message
bcdurak Nov 8, 2024
1fafb7e
consistent creation\
bcdurak Nov 8, 2024
ad4a4f7
merged develop
bcdurak Nov 8, 2024
d90f55d
adjusting tests
bcdurak Nov 8, 2024
e0db418
linting
bcdurak Nov 8, 2024
14dfdea
changes for step metadata
bcdurak Nov 8, 2024
d89358d
more test adjustments
bcdurak Nov 8, 2024
7d90305
testing unit tests
bcdurak Nov 8, 2024
b060987
linting
bcdurak Nov 8, 2024
43a7034
fixing more tests
bcdurak Nov 8, 2024
28ecdc1
fixing more tests
bcdurak Nov 8, 2024
e0c5e4f
more test fixes
bcdurak Nov 8, 2024
6edc16e
fixing the test
bcdurak Nov 11, 2024
030d530
fixing per comments
bcdurak Nov 11, 2024
929fba4
added validation, constant error message
bcdurak Nov 11, 2024
be79553
merged develop
bcdurak Nov 11, 2024
e07b777
Merge branch 'develop' into feature/best-metadata
bcdurak Nov 11, 2024
c1bcb00
linting
bcdurak Nov 12, 2024
57ba4f9
new changes
bcdurak Nov 12, 2024
a68f3c2
second checkpoint
bcdurak Nov 12, 2024
0b5eb12
fixes and merged develop
bcdurak Nov 13, 2024
64fd8b8
Merge branch 'develop' into feature/followup-run-metadata
bcdurak Nov 13, 2024
07ece0c
fixing revisions
bcdurak Nov 13, 2024
4b6f84a
adding overlap to remove warnings
bcdurak Nov 13, 2024
dca5913
complete docs changes
bcdurak Nov 13, 2024
b767269
adding a parameter to control the related entity behaviour
bcdurak Nov 13, 2024
3b1ee3a
fixing the toc
bcdurak Nov 13, 2024
6f2e224
fixed the description
bcdurak Nov 13, 2024
bb21a07
docstring
bcdurak Nov 13, 2024
791ddc0
spellcheck
bcdurak Nov 13, 2024
bf771f7
metadata creation during artifact version creation
bcdurak Nov 13, 2024
9ceedde
allowing artifact metadata with name for external artifact
bcdurak Nov 13, 2024
52fdba4
update the template versions
bcdurak Nov 13, 2024
1855edf
Auto-update of LLM Finetuning template
actions-user Nov 13, 2024
a2462b5
Auto-update of Starter template
actions-user Nov 13, 2024
4283966
Auto-update of E2E template
actions-user Nov 13, 2024
a915e62
Auto-update of NLP template
actions-user Nov 13, 2024
f679fea
fixing the migration script
bcdurak Nov 13, 2024
64b0a0b
formatting
bcdurak Nov 13, 2024
3969ede
merged template changes
bcdurak Nov 13, 2024
6fef781
merged develop
bcdurak Nov 18, 2024
df0fbda
redirects
bcdurak Nov 18, 2024
324e67f
minor fixes
bcdurak Nov 18, 2024
20f4e38
Merge branch 'develop' into feature/followup-run-metadata
bcdurak Nov 20, 2024
f966b0f
working pipelines again
bcdurak Nov 20, 2024
18e5f1e
small fix
bcdurak Nov 20, 2024
060ec23
merged develop
bcdurak Nov 20, 2024
45fa324
merged develop
bcdurak Nov 25, 2024
5832f80
merged develop
bcdurak Nov 25, 2024
05f8ab5
working checkpoint
bcdurak Nov 26, 2024
8a353b8
fixes, linting, docstrings
bcdurak Nov 26, 2024
af009b0
fixing unit tests
bcdurak Nov 26, 2024
fac74c9
docs updates 1
bcdurak Nov 26, 2024
4370bdc
docs update 2
bcdurak Nov 26, 2024
6b06a90
fixing integration tests
bcdurak Nov 26, 2024
59d9867
spellcheck
bcdurak Nov 26, 2024
e482c61
formatting
bcdurak Nov 26, 2024
ffc5787
Auto-update of E2E template
actions-user Nov 26, 2024
dd69dd7
docs changes
bcdurak Nov 27, 2024
eab5ba5
review comments
bcdurak Nov 27, 2024
424dc57
merged develop
bcdurak Nov 27, 2024
4ea0740
Merge branch 'develop' into feature/followup-run-metadata
bcdurak Nov 27, 2024
81e5920
added the batch rbac call
bcdurak Nov 27, 2024
bd53796
merged develop and resolved conflicts
bcdurak Nov 27, 2024
0f785ff
added a validator to check the name of the keys
bcdurak Nov 27, 2024
0988141
small adjustments
bcdurak Nov 27, 2024
cc297f2
base schema added
bcdurak Nov 28, 2024
b411aca
merge develop
bcdurak Nov 28, 2024
4512ac7
formatting
bcdurak Nov 28, 2024
5824195
Merge branch 'develop' into feature/followup-run-metadata
bcdurak Nov 28, 2024
ff695c4
new functionalities
bcdurak Nov 28, 2024
5e34213
breaking circular imports
bcdurak Nov 28, 2024
be6ea07
spellchecker
bcdurak Nov 28, 2024
ee68112
other minor fixes
bcdurak Nov 28, 2024
1b8a6f1
covering the uncovered case
bcdurak Nov 28, 2024
f343acd
adjusting tests
bcdurak Nov 28, 2024
a36ab49
fixing the quickstart again
bcdurak Nov 28, 2024
77b2310
minor change
bcdurak Nov 28, 2024
94c26fd
going back to publisher step id
bcdurak Nov 28, 2024
aabb2ba
updating github refs
bcdurak Nov 28, 2024
dc1df13
Auto-update of LLM Finetuning template
actions-user Nov 28, 2024
0a7e26a
Auto-update of Starter template
actions-user Nov 28, 2024
d6de5cf
fixing tests
bcdurak Nov 28, 2024
e04b03b
merged develop
bcdurak Nov 28, 2024
2a5feb2
updated docs
bcdurak Nov 28, 2024
94bb86a
Auto-update of E2E template
actions-user Nov 28, 2024
4997e3a
Auto-update of NLP template
actions-user Nov 28, 2024
2953f06
formatting
bcdurak Nov 28, 2024
759b98c
merge develop
bcdurak Nov 28, 2024
88e247c
review comments
bcdurak Nov 29, 2024
4cd155a
merged develop
bcdurak Nov 29, 2024
755c36f
adding some tests in
bcdurak Nov 29, 2024
c1b5a4c
review comments
bcdurak Nov 29, 2024
93dccd8
Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…
bcdurak Nov 29, 2024
6bc0002
Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…
bcdurak Nov 29, 2024
852d99d
Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…
bcdurak Nov 29, 2024
e142836
Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…
bcdurak Nov 29, 2024
ded49db
Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…
bcdurak Nov 29, 2024
bedb364
changed assert to value error
bcdurak Nov 29, 2024
6cdc3a7
merged develop
bcdurak Nov 29, 2024
dcc8aa9
fixed the alembic head
bcdurak Nov 29, 2024
772f639
changed the interaction with the models
bcdurak Nov 29, 2024
d13a909
trimmed down
bcdurak Nov 29, 2024
169b4c6
small bugfix
bcdurak Nov 29, 2024
ba131c0
naming recommendations
bcdurak Nov 29, 2024
e05fb98
linting
bcdurak Nov 29, 2024
d662ac3
fixing the test
bcdurak Nov 29, 2024
ac18b53
Merge branch 'develop' into feature/followup-run-metadata
bcdurak Nov 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitbook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ redirects:
how-to/setting-up-a-project-repository/best-practices: how-to/project-setup-and-management/setting-up-a-project-repository/set-up-repository.md
getting-started/zenml-pro/system-architectures: getting-started/system-architectures.md
how-to/build-pipelines/name-your-pipeline-and-runs: how-to/pipeline-development/build-pipelines/name-your-pipeline-runs.md
how-to/model-management-metrics/track-metrics-metadata/attach-metadata-to-steps: how-to/model-management-metrics/track-metrics-metadata/attach-metadata-to-a-step.md

# ZenML Pro
getting-started/zenml-pro/user-management: getting-started/zenml-pro/core-concepts.md
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/update-templates-to-examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.11.20 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -118,7 +118,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.10.30 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -189,7 +189,7 @@ jobs:
python-version: ${{ inputs.python-version }}
stack-name: local
ref-zenml: ${{ github.ref }}
ref-template: 2024.10.30 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down Expand Up @@ -261,7 +261,7 @@ jobs:
with:
python-version: ${{ inputs.python-version }}
ref-zenml: ${{ github.ref }}
ref-template: 2024.11.08 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
ref-template: 2024.11.28 # Make sure it is aligned with ZENML_PROJECT_TEMPLATES from src/zenml/cli/base.py
- name: Clean-up
run: |
rm -rf ./local_checkout
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,44 @@ description: Tracking metrics and metadata

# Track metrics and metadata

Logging metrics and metadata is standardized in ZenML. The most common pattern is to use the `log_xxx` methods, e.g.:
ZenML provides a unified way to log and manage metrics and metadata through
the `log_metadata` function. This versatile function allows you to log
metadata across various entities like models, artifacts, steps, and runs
through a single interface. Additionally, you can adjust if you want to
automatically the same metadata for the related entities.

* Log metadata to a [model](attach-metadata-to-a-model.md): `log_model_metadata`
* Log metadata to an [artifact](attach-metadata-to-an-artifact.md): `log_artifact_metadata`
* Log metadata to a [step](attach-metadata-to-steps.md): `log_step_metadata`
### The most basic use-case

You can use the `log_metadata` function within a step:

```python
from zenml import step, log_metadata

@step
def my_step() -> ...:
log_metadata(metadata={"accuracy": 0.91})
...
```

This will log the `accuracy` for the step, its pipeline run, and if provided
its model version.

### Additional use-cases

The `log_metadata` function also supports various use-cases by allowing you to
specify the target entity (e.g., model, artifact, step, or run) with flexible
parameters. You can learn more about these use-cases in the following pages:

- [Log metadata to a step](attach-metadata-to-a-step.md)
- [Log metadata to a run](attach-metadata-to-a-run.md)
- [Log metadata to an artifact](attach-metadata-to-an-artifact.md)
- [Log metadata to a model](attach-metadata-to-a-model.md)

{% hint style="warning" %}
The older methods for logging metadata to specific entities, such as
`log_model_metadata`, `log_artifact_metadata`, and `log_step_metadata`, are
now deprecated. It is recommended to use `log_metadata` for all future
implementations.
{% endhint %}
hyperlint-ai[bot] marked this conversation as resolved.
Show resolved Hide resolved

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Original file line number Diff line number Diff line change
@@ -1,62 +1,93 @@
---
description: >-
Attach any metadata as key-value pairs to your models for future reference and
auditability.
description: Learn how to attach metadata to a model.
---

# Attach metadata to a model
avishniakov marked this conversation as resolved.
Show resolved Hide resolved

ZenML allows you to log metadata for models, which provides additional context
that goes beyond individual artifact details. Model metadata can represent
high-level insights, such as evaluation results, deployment information,
or customer-specific details, making it easier to manage and interpret
the model's usage and performance across different versions.

## Logging Metadata for Models

While artifact metadata is specific to individual outputs of steps, model metadata encapsulates broader and more general information that spans across multiple artifacts. For example, evaluation results or the name of a customer for whom the model is intended could be logged with the model.
To log metadata for a model, use the `log_metadata` function. This function
lets you attach key-value metadata to a model, which can include metrics and
other JSON-serializable values, such as custom ZenML types like `Uri`,
`Path`, and `StorageSize`.

Here's an example of logging metadata for a model:

```python
from zenml import step, log_model_metadata, ArtifactConfig, get_step_context
from typing import Annotated

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import ClassifierMixin
from sklearn.ensemble import RandomForestClassifier

from zenml import step, log_metadata, ArtifactConfig, get_step_context


@step
def train_model(dataset: pd.DataFrame) -> Annotated[ClassifierMixin, ArtifactConfig(name="sklearn_classifier")]:
"""Train a model"""
# Fit the model and compute metrics
def train_model(dataset: pd.DataFrame) -> Annotated[
ClassifierMixin, ArtifactConfig(name="sklearn_classifier")
]:
"""Train a model and log model metadata."""
classifier = RandomForestClassifier().fit(dataset)
accuracy, precision, recall = ...

# Log metadata for the model
# This associates the metadata with the ZenML model, not the artifact
log_model_metadata(

log_metadata(
metadata={
"evaluation_metrics": {
"accuracy": accuracy,
"precision": precision,
"recall": recall
}
},
# Omitted model_name will use the model in the current context
model_name="zenml_model_name",
# Omitted model_version will default to 'latest'
model_version="zenml_model_version",
infer_model=True,
)

return classifier
```

In this example, the metadata is associated with the model rather than the specific classifier artifact. This is particularly useful when the metadata reflects an aggregation or summary of various steps and artifacts in the pipeline.
In this example, the metadata is associated with the model rather than the
specific classifier artifact. This is particularly useful when the metadata
reflects an aggregation or summary of various steps and artifacts in the
pipeline.


### Selecting Models with `log_metadata`

When using `log_metadata`, ZenML provides flexible options of attaching
metadata to model versions:

1. **Using `infer_model`**: If used within a step, ZenML will use the step
context to infer the model it is using and attach the metadata to it.
2. **Model Name and Version Provided**: If both a model name and version are
provided, ZenML will use these to identify and attach metadata to the
specific model version.
3. **Model Version ID Provided**: If a model version ID is directly provided,
ZenML will use it to fetch and attach the metadata to that specific model
version.

## Fetching logged metadata

Once metadata has been logged in an [artifact](attach-metadata-to-an-artifact.md), model, or [step](attach-metadata-to-steps.md), we can easily fetch the metadata with the ZenML Client:
Once metadata has been attached to a model, it can be retrieved for inspection
or analysis using the ZenML Client.

```python
from zenml.client import Client

client = Client()
model = client.get_model_version("my_model", "my_version")

print(model.run_metadata["metadata_key"].value)
print(model.run_metadata["metadata_key"])
```

{% hint style="info" %}
When you are fetching metadata using a specific key, the returned value will
always reflect the latest entry.
{% endhint %}
hyperlint-ai[bot] marked this conversation as resolved.
Show resolved Hide resolved

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
description: Learn how to attach metadata to a run.
---

# Attach Metadata to a Run

In ZenML, you can log metadata directly to a pipeline run, either during or
after execution, using the `log_metadata` function. This function allows you
to attach a dictionary of key-value pairs as metadata to a pipeline run,
with values that can be any JSON-serializable data type, including ZenML
custom types like `Uri`, `Path`, `DType`, and `StorageSize`.

## Logging Metadata Within a Run

If you are logging metadata from within a step that’s part of a pipeline run,
calling `log_metadata` will attach the specified metadata to the current
pipeline run where the metadata key will have the `step_name::metadata_key`
pattern. This allows you to use the same metadata key from different steps
while the run's still executing.

```python
from typing import Annotated

import pandas as pd
from sklearn.base import ClassifierMixin
from sklearn.ensemble import RandomForestClassifier

from zenml import step, log_metadata, ArtifactConfig


@step
def train_model(dataset: pd.DataFrame) -> Annotated[
ClassifierMixin,
ArtifactConfig(name="sklearn_classifier", is_model_artifact=True)
]:
"""Train a model and log run-level metadata."""
classifier = RandomForestClassifier().fit(dataset)
accuracy, precision, recall = ...

# Log metadata at the run level
log_metadata(
metadata={
"run_metrics": {
"accuracy": accuracy,
"precision": precision,
"recall": recall
}
}
)
return classifier
```

## Manually Logging Metadata to a Pipeline Run

You can also attach metadata to a specific pipeline run without needing a step,
using identifiers like the run ID. This is useful when logging information or
metrics that were calculated post-execution.

```python
from zenml import log_metadata

log_metadata(
metadata={"post_run_info": {"some_metric": 5.0}},
run_id_name_or_prefix="run_id_name_or_prefix"
)
```

## Fetching Logged Metadata

Once metadata has been logged in a pipeline run, you can retrieve it using
the ZenML Client:

```python
from zenml.client import Client

client = Client()
run = client.get_pipeline_run("run_id_name_or_prefix")

print(run.run_metadata["metadata_key"])
```

{% hint style="info" %}
When you are fetching metadata using a specific key, the returned value will
always reflect the latest entry.
{% endhint %}

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Loading
Loading