-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up on the run_metadata
changes
#3193
Conversation
…nto feature/better-metadata
…nto feature/better-metadata
…nto feature/better-metadata
…nto feature/better-metadata
…nto feature/better-metadata
NLP template updates in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, nothing major, mostly around new flag for artifacts. If you consider this as not critical or not relevant, I will put my approval upfront to prevent PR being blocked from merging.
infer_artifact: bool = False, | ||
artifact_name: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do those 2 come together here?
infer_artifact: bool = False, | |
artifact_name: Optional[str] = None, | |
infer_artifact: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can potentially have a step with multiple outputs, you name one of them through the artifact_name
and expect the infer_artifact
to look for it in the step execution.
log_metadata( | ||
metadata=output_metadata, | ||
artifact_name="int_output", | ||
infer_artifact=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why infer_artifact=True
? I might be misreading this, but you gave artifact name already, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also discussed with @schustmi. Eventually, we do not want to enable people to provide just a name to log_metadata
for the latest version (with models or artifacts). So, if you provide just a name, it needs to be paired with infer_artifact=True
, so we look for it in the output space.
src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py
Outdated
Show resolved
Hide resolved
src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py
Outdated
Show resolved
Hide resolved
src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py
Outdated
Show resolved
Hide resolved
src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py
Outdated
Show resolved
Hide resolved
src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py
Outdated
Show resolved
Hide resolved
) -> None: ... | ||
|
||
|
||
def log_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole function could be a lot shorter if we just store the RunMetadataResource.id
and RunMetadataResource.type
and do one client call in the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I trimmed it a bit already but kept the resources
parameter as a list and would rather not remove it right now. I am still a bit in between if we will ever tag multiple resources at the same time and if it is alright with you, I would keep it like this.
…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>
…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>
…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>
…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>
…_run_metadata.py Co-authored-by: Michael Schuster <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style guide flagged several spelling errors that seemed like false positives. We skipped posting inline suggestions for the following words:
- Hyperparameter
Describe changes
This PR optimizes the way that we store run metadata related to different entities. (addresses the review comment here)
In our old implementation, when someone calls
log_metadata
, it is possible that they attach the same metadata to different entities such as pipeline runs, step runs, and model versions. This process used to create X different entries with the same key-value pair in the metadata table.In order to optimize this process, this PR separates the previous run metadata table into two tables:
For this to work, I also implemented a new
RunMetadataRequest
model, that can hold more than one entity per key-value pair. In all the previous calls that used this request model, all affected models and schemas were adjusted accordingly.Other related changes
Remaining TODOs
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes