-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Flow.metadata
attribute and Flow.update_metadata
method
#679
Conversation
…in update_metadata method
to use equivalent code as Flow
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #679 +/- ##
==========================================
- Coverage 99.23% 99.18% -0.06%
==========================================
Files 21 21
Lines 1573 1590 +17
Branches 427 339 -88
==========================================
+ Hits 1561 1577 +16
Misses 10 10
- Partials 2 3 +1
|
Flow.metadata
attribute now used in Flow.update_metadata
methodFlow.metadata
attribute used in Flow.update_metadata
method
@utf i'm sure you're strapped for time but even partial feedback here would be much appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @janosh. My only concern is about breaking the API of update_metadata
between Flow
s and Job
s. If we add the target
option to Job.update_metadata
that should make this function easier to use and the implementation cleaner.
src/jobflow/core/flow.py
Outdated
self.metadata.update(update) | ||
|
||
if target in ["jobs", "both"]: | ||
for job in self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here job
could actually be a nested flow. So you should still iterate over these if target = "flow"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the naming could be improved. target = "flow"
isn't meant to update the metadata of all nested flows (to the exclusion of jobs) but rather only the flow itself, not any of its jobs or nested flows.
how about we rename from target: Literal["flow", "jobs", "both"] = "both"
to target: Literal["self", "nested", "both"] = "both"
? unless you think it's important to have more control, i.e. be able to only update nested Flow
or Job
metadata. there might be use cases for that. in which case maybe we want target: Literal["self", "nested", "jobs", "flows", "both"] = "both"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify this could be the doc string:
target
Specifies where to apply the metadata update. Options are:
- "self": Update only the Flow's own metadata
- "nested": Update only the metadata of Jobs/Flows within the Flow
- "jobs": Update only the metadata of Jobs within the Flow
- "flows": Update only the metadata of Flows within the Flow
- "all": Update both the Flow's metadata and nested Job+Flow metadata (default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overcomplicating it. If you want to select specific flows or jobs then you can pass in a name or class filter. So no need for the extra options. The main thing is the API should be consistent between jobs/flows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. then the easiest thing would be to get rid of target
altogether and use the "all"
behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point. I think the benefit of target is that it would be a shortcut for class=Flow
vs class=Job
. I would be happy either having the target option or specifying the shortcut in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different issue but while we're on the topic of API design, what's your take on adding a callback_filter: Callable[[Flow | Job], bool]
? users would pass in a function which takes the Flow
or Job
instance on which you invoke update_metadata
(or update_config
, ...) and returns True
if updates should be applied. perhaps more prone to user error but also very versatile. usage example:
Flow().update_metadata(
{"material_id": 42},
callback_filter=lambda flow: SomeMaker in map(type, flow)
and flow.name == "flow name"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the benefit of target is that it would be a shortcut for
class=Flow
vsclass=Job
. I would be happy either having the target option or specifying the shortcut in the docstring.
i don't follow. by class=Job|Flow
, did you mean filter_function=Job|Flow
? the job variant filter_function=Job
doesn't seem to work the way you're suggesting so maybe you meant sth else?
EDIT: i guess you meant class_filter
on Maker.update_kwargs
but that isn't implemented for Flow/Job.update_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking of class_filter
. I think the callback_filter
you proposed sounds very flexible and would be useful!
@janosh I lost track of this. Is it good to go? |
Not quite but Orion just pinged me about it so will try to pick it up again |
@utf i think this is ready to go now. a future PR could add the |
updates test_flow_update_metadata_dynamic
hmmm... added a lot of test cases. not sure what else to add. could it be the coverage calculation is off? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janosh just one comment. Happy for you to merge once that's done. I can then push a new version.
@@ -128,6 +128,8 @@ def __init__( | |||
order: JobOrder = JobOrder.AUTO, | |||
uuid: str = None, | |||
hosts: list[str] = None, | |||
metadata: dict[str, Any] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you add docstrings for these options in the class docstring? E.g. see the corresponding docs for Job.
Flow.metadata
attribute used in Flow.update_metadata
methodFlow.metadata
attribute and Flow.update_metadata
method
@@ -927,6 +926,7 @@ def update_metadata( | |||
function_filter: Callable = None, | |||
dict_mod: bool = False, | |||
dynamic: bool = True, | |||
callback_filter: Callable[[jobflow.Flow | Job], bool] = lambda _: True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janosh: the Callable
callback_filter
s seem to break some functionality, getting 'dict' object is not callable
when these are decoded. Double checked manually that MontyDecoder
is not rehydrating these into callables, just as dict
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't consider serialization. Sounds like something we need to add unit tests for. I'm guessing you can't literal_eval a callable safely? in which case not sure there's a solution other than removing the keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you issue an error when someone tries to serialize a callback_filter to explain that this feature is only compatible with create-and-run kind of workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialisation shouldn't be a problem if the function is importable, e.g if it is defined in atomate2 or another package. Alternatively we could pickle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default callback_filter
isn't serializing correctly - I'll look into it a bit more and update you both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serialisation shouldn't be a problem if the function is importable, e.g if it is defined in atomate2 or another package.
the intended use case is for callback_filter
to be a simple lambda
function. if you need complex filters, that's a sign that perhaps you should create a new Flow
subclass instead.
Alternatively we could pickle it?
that might work. though we'd have to test cross-platform and cross-Python round-tripping. I've had cases where something pickled on Mac couldn't be unpickled on Linux (and likewise with Python 3.8 vs, say, 3.12) but pickle
seems to have gotten less dicey lately
Flow.update_metadata
is now much more similar to and has reached feature parity withJob.update_metadata
:jobflow/src/jobflow/core/job.py
Lines 922 to 930 in 367836b