Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up on the run_metadata changes #3193

Merged
merged 162 commits into from
Nov 29, 2024
Merged

Conversation

bcdurak
Copy link
Contributor

@bcdurak bcdurak commented Nov 13, 2024

Describe changes

This PR optimizes the way that we store run metadata related to different entities. (addresses the review comment here)

In our old implementation, when someone calls log_metadata, it is possible that they attach the same metadata to different entities such as pipeline runs, step runs, and model versions. This process used to create X different entries with the same key-value pair in the metadata table.

In order to optimize this process, this PR separates the previous run metadata table into two tables:

  • One that holds the actual key-value pair.
  • One that links this pair to different resources.

For this to work, I also implemented a new RunMetadataRequest model, that can hold more than one entity per key-value pair. In all the previous calls that used this request model, all affected models and schemas were adjusted accordingly.

Other related changes

Remaining TODOs

  • Add more tests (model_version_id logging)

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

AlexejPenner and others added 30 commits October 17, 2024 17:14
Copy link
Contributor

NLP template updates in examples/e2e_nlp have been pushed.

Copy link
Contributor

@avishniakov avishniakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, nothing major, mostly around new flag for artifacts. If you consider this as not critical or not relevant, I will put my approval upfront to prevent PR being blocked from merging.

src/zenml/model/utils.py Show resolved Hide resolved
Comment on lines +81 to +82
infer_artifact: bool = False,
artifact_name: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do those 2 come together here?

Suggested change
infer_artifact: bool = False,
artifact_name: Optional[str] = None,
infer_artifact: bool = False,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can potentially have a step with multiple outputs, you name one of them through the artifact_name and expect the infer_artifact to look for it in the step execution.

src/zenml/utils/metadata_utils.py Outdated Show resolved Hide resolved
src/zenml/utils/metadata_utils.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/schemas/utils.py Show resolved Hide resolved
log_metadata(
metadata=output_metadata,
artifact_name="int_output",
infer_artifact=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why infer_artifact=True? I might be misreading this, but you gave artifact name already, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also discussed with @schustmi. Eventually, we do not want to enable people to provide just a name to log_metadata for the latest version (with models or artifacts). So, if you provide just a name, it needs to be paired with infer_artifact=True, so we look for it in the output space.

src/zenml/zen_stores/schemas/step_run_schemas.py Outdated Show resolved Hide resolved
src/zenml/zen_stores/schemas/run_metadata_schemas.py Outdated Show resolved Hide resolved
src/zenml/utils/metadata_utils.py Outdated Show resolved Hide resolved
src/zenml/utils/metadata_utils.py Outdated Show resolved Hide resolved
) -> None: ...


def log_metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole function could be a lot shorter if we just store the RunMetadataResource.id and RunMetadataResource.type and do one client call in the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I trimmed it a bit already but kept the resources parameter as a list and would rather not remove it right now. I am still a bit in between if we will ever tag multiple resources at the same time and if it is alright with you, I would keep it like this.

Copy link
Contributor

@hyperlint-ai hyperlint-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style guide flagged several spelling errors that seemed like false positives. We skipped posting inline suggestions for the following words:

  • Hyperparameter

@bcdurak bcdurak requested a review from schustmi November 29, 2024 12:41
@bcdurak bcdurak merged commit fbbfc29 into develop Nov 29, 2024
65 of 67 checks passed
@bcdurak bcdurak deleted the feature/followup-run-metadata branch November 29, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal To filter out internal PRs and issues run-slow-ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants