Follow-up on the `run_metadata` changes #3193

bcdurak · 2024-11-13T09:15:52Z

Describe changes

This PR optimizes the way that we store run metadata related to different entities. (addresses the review comment here)

In our old implementation, when someone calls log_metadata, it is possible that they attach the same metadata to different entities such as pipeline runs, step runs, and model versions. This process used to create X different entries with the same key-value pair in the metadata table.

In order to optimize this process, this PR separates the previous run metadata table into two tables:

One that holds the actual key-value pair.
One that links this pair to different resources.

For this to work, I also implemented a new RunMetadataRequest model, that can hold more than one entity per key-value pair. In all the previous calls that used this request model, all affected models and schemas were adjusted accordingly.

Other related changes

Remaining TODOs

Add more tests (model_version_id logging)

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
If my change requires a change to docs, I have updated the documentation accordingly.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

…nto feature/better-metadata

github-actions · 2024-11-28T15:24:27Z

NLP template updates in examples/e2e_nlp have been pushed.

avishniakov

I left some comments, nothing major, mostly around new flag for artifacts. If you consider this as not critical or not relevant, I will put my approval upfront to prevent PR being blocked from merging.

src/zenml/model/utils.py

avishniakov · 2024-11-28T15:21:13Z

src/zenml/utils/metadata_utils.py

+    infer_artifact: bool = False,
+    artifact_name: Optional[str] = None,


Why do those 2 come together here?

Suggested change

infer_artifact: bool = False,

artifact_name: Optional[str] = None,

infer_artifact: bool = False,

You can potentially have a step with multiple outputs, you name one of them through the artifact_name and expect the infer_artifact to look for it in the step execution.

src/zenml/utils/metadata_utils.py

src/zenml/zen_stores/schemas/utils.py

avishniakov · 2024-11-28T15:24:37Z

tests/integration/functional/artifacts/test_utils.py

+    log_metadata(
+        metadata=output_metadata,
+        artifact_name="int_output",
+        infer_artifact=True,


Why infer_artifact=True? I might be misreading this, but you gave artifact name already, no?

I also discussed with @schustmi. Eventually, we do not want to enable people to provide just a name to log_metadata for the latest version (with models or artifacts). So, if you provide just a name, it needs to be paired with infer_artifact=True, so we look for it in the output space.

src/zenml/zen_stores/schemas/step_run_schemas.py

src/zenml/zen_stores/schemas/run_metadata_schemas.py

src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py

src/zenml/utils/metadata_utils.py

schustmi · 2024-11-29T11:43:41Z

src/zenml/utils/metadata_utils.py

+) -> None: ...
+
+
+def log_metadata(


I think the whole function could be a lot shorter if we just store the RunMetadataResource.id and RunMetadataResource.type and do one client call in the end?

Makes sense. I trimmed it a bit already but kept the resources parameter as a list and would rather not remove it right now. I am still a bit in between if we will ever tag multiple resources at the same time and if it is alright with you, I would keep it like this.

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

hyperlint-ai

The style guide flagged several spelling errors that seemed like false positives. We skipped posting inline suggestions for the following words:

Hyperparameter

AlexejPenner and others added 30 commits October 17, 2024 17:14

Initial commit, nuking all metadata responses and seeing what breaks

697a93f

Removed last remnant of LazyLoader

733a6c8

Merge branch 'develop' into feature/better-metadata

3ce54ef

Reintroducing the lazy loaders.

ae71757

Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…

01b5179

…nto feature/better-metadata

Add LazyRunMetadataResponse to EntrypointFunctionDefinition

7d0ff82

Test for lazy loaders works now

d7a9f83

Merge branch 'develop' into feature/better-metadata

bccb4d2

Fixed tests, reformatted

9a0e0b2

Use updated template

145b90b

Auto-update of Starter template

1e1991a

Merge branch 'develop' into feature/better-metadata

adab934

Updated more templates

d83628a

Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…

6d13071

…nto feature/better-metadata

Fixed failing test

c4febf3

Fixed step run schemas

5aef8ab

Auto-update of E2E template

0b66f07

Auto-update of NLP template

4b2434a

Fixed tests, removed additional .value access

8f4af6e

Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…

cc6902b

…nto feature/better-metadata

Further fixing

edba625

Merge branch 'develop' into feature/better-metadata

7d5cfb7

Fixed linting issues

c2b6955

Merge branch 'develop' into feature/better-metadata

e2bd53a

Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…

a582836

…nto feature/better-metadata

Merge branch 'develop' into feature/better-metadata

4f82ade

Merge branch 'feature/better-metadata' of github.com:zenml-io/zenml i…

58293bb

…nto feature/better-metadata

Reformatted

8f6d305

Linted, formatted and tested again

6b18322

Typing

8b3a1bd

Auto-update of NLP template

4997e3a

avishniakov approved these changes Nov 28, 2024

View reviewed changes

bcdurak added 5 commits November 28, 2024 16:34

formatting

2953f06

merge develop

759b98c

review comments

88e247c

merged develop

4cd155a

adding some tests in

755c36f

schustmi requested changes Nov 29, 2024

View reviewed changes

bcdurak and others added 6 commits November 29, 2024 13:03

review comments

c1b5a4c

Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…

93dccd8

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…

6bc0002

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…

852d99d

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…

e142836

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate…

ded49db

…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>

hyperlint-ai bot reviewed Nov 29, 2024

View reviewed changes

bcdurak added 8 commits November 29, 2024 13:09

changed assert to value error

bedb364

merged develop

6cdc3a7

fixed the alembic head

dcc8aa9

changed the interaction with the models

772f639

trimmed down

d13a909

small bugfix

169b4c6

naming recommendations

ba131c0

linting

e05fb98

bcdurak requested a review from schustmi November 29, 2024 12:41

schustmi approved these changes Nov 29, 2024

View reviewed changes

bcdurak and others added 2 commits November 29, 2024 15:18

fixing the test

d662ac3

Merge branch 'develop' into feature/followup-run-metadata

ac18b53

bcdurak merged commit fbbfc29 into develop Nov 29, 2024
65 of 67 checks passed

bcdurak deleted the feature/followup-run-metadata branch November 29, 2024 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up on the `run_metadata` changes #3193

Follow-up on the `run_metadata` changes #3193

bcdurak commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024

avishniakov left a comment

avishniakov Nov 28, 2024

bcdurak Nov 29, 2024

avishniakov Nov 28, 2024

bcdurak Nov 29, 2024

schustmi Nov 29, 2024

bcdurak Nov 29, 2024

hyperlint-ai bot left a comment

		infer_artifact: bool = False,
		artifact_name: Optional[str] = None,

Follow-up on the run_metadata changes #3193

Follow-up on the run_metadata changes #3193

Conversation

bcdurak commented Nov 13, 2024 • edited Loading

Describe changes

Other related changes

Remaining TODOs

Pre-requisites

Types of changes

github-actions bot commented Nov 28, 2024

avishniakov left a comment

Choose a reason for hiding this comment

avishniakov Nov 28, 2024

Choose a reason for hiding this comment

bcdurak Nov 29, 2024

Choose a reason for hiding this comment

avishniakov Nov 28, 2024

Choose a reason for hiding this comment

bcdurak Nov 29, 2024

Choose a reason for hiding this comment

schustmi Nov 29, 2024

Choose a reason for hiding this comment

bcdurak Nov 29, 2024

Choose a reason for hiding this comment

hyperlint-ai bot left a comment

Choose a reason for hiding this comment

Follow-up on the `run_metadata` changes #3193

Follow-up on the `run_metadata` changes #3193

bcdurak commented Nov 13, 2024 •

edited

Loading