Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flow.metadata attribute and Flow.update_metadata method #679

Merged
merged 11 commits into from
Nov 22, 2024
Merged

Conversation

janosh
Copy link
Member

@janosh janosh commented Sep 18, 2024

Flow.update_metadata is now much more similar to and has reached feature parity with Job.update_metadata:

def update_metadata(
self,
update: dict[str, Any],
name_filter: str = None,
function_filter: Callable = None,
dict_mod: bool = False,
dynamic: bool = True,
):
"""

Copy link

codecov bot commented Sep 18, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 99.18%. Comparing base (a740c6c) to head (b6786ca).
Report is 52 commits behind head on main.

Files with missing lines Patch % Lines
src/jobflow/core/flow.py 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #679      +/-   ##
==========================================
- Coverage   99.23%   99.18%   -0.06%     
==========================================
  Files          21       21              
  Lines        1573     1590      +17     
  Branches      427      339      -88     
==========================================
+ Hits         1561     1577      +16     
  Misses         10       10              
- Partials        2        3       +1     
Files with missing lines Coverage Δ
src/jobflow/core/job.py 99.15% <100.00%> (+<0.01%) ⬆️
src/jobflow/managers/local.py 100.00% <ø> (ø)
src/jobflow/core/flow.py 99.55% <92.30%> (-0.45%) ⬇️

... and 2 files with indirect coverage changes

---- 🚨 Try these New Features:

@janosh janosh requested a review from utf September 18, 2024 21:52
@janosh janosh changed the title Add Flow.metadata attribute now used in Flow.update_metadata method Add Flow.metadata attribute used in Flow.update_metadata method Sep 18, 2024
@janosh
Copy link
Member Author

janosh commented Sep 27, 2024

@utf i'm sure you're strapped for time but even partial feedback here would be much appreciated!

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @janosh. My only concern is about breaking the API of update_metadata between Flows and Jobs. If we add the target option to Job.update_metadata that should make this function easier to use and the implementation cleaner.

self.metadata.update(update)

if target in ["jobs", "both"]:
for job in self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here job could actually be a nested flow. So you should still iterate over these if target = "flow".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the naming could be improved. target = "flow" isn't meant to update the metadata of all nested flows (to the exclusion of jobs) but rather only the flow itself, not any of its jobs or nested flows.

how about we rename from target: Literal["flow", "jobs", "both"] = "both" to target: Literal["self", "nested", "both"] = "both"? unless you think it's important to have more control, i.e. be able to only update nested Flow or Job metadata. there might be use cases for that. in which case maybe we want target: Literal["self", "nested", "jobs", "flows", "both"] = "both"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify this could be the doc string:

        target
            Specifies where to apply the metadata update. Options are:

            - "self": Update only the Flow's own metadata
            - "nested": Update only the metadata of Jobs/Flows within the Flow
            - "jobs": Update only the metadata of Jobs within the Flow
            - "flows": Update only the metadata of Flows within the Flow
            - "all": Update both the Flow's metadata and nested Job+Flow metadata (default)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overcomplicating it. If you want to select specific flows or jobs then you can pass in a name or class filter. So no need for the extra options. The main thing is the API should be consistent between jobs/flows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. then the easiest thing would be to get rid of target altogether and use the "all" behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting point. I think the benefit of target is that it would be a shortcut for class=Flow vs class=Job. I would be happy either having the target option or specifying the shortcut in the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different issue but while we're on the topic of API design, what's your take on adding a callback_filter: Callable[[Flow | Job], bool]? users would pass in a function which takes the Flow or Job instance on which you invoke update_metadata (or update_config, ...) and returns True if updates should be applied. perhaps more prone to user error but also very versatile. usage example:

Flow().update_metadata(
    {"material_id": 42},
    callback_filter=lambda flow: SomeMaker in map(type, flow)
    and flow.name == "flow name"
)

Copy link
Member Author

@janosh janosh Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the benefit of target is that it would be a shortcut for class=Flow vs class=Job. I would be happy either having the target option or specifying the shortcut in the docstring.

i don't follow. by class=Job|Flow, did you mean filter_function=Job|Flow? the job variant filter_function=Job doesn't seem to work the way you're suggesting so maybe you meant sth else?

EDIT: i guess you meant class_filter on Maker.update_kwargs but that isn't implemented for Flow/Job.update_metadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking of class_filter. I think the callback_filter you proposed sounds very flexible and would be useful!

@utf
Copy link
Member

utf commented Nov 18, 2024

@janosh I lost track of this. Is it good to go?

@janosh
Copy link
Member Author

janosh commented Nov 18, 2024

Not quite but Orion just pinged me about it so will try to pick it up again

@janosh
Copy link
Member Author

janosh commented Nov 18, 2024

@utf i think this is ready to go now. a future PR could add the callback_filter keyword to other update methods like update_kwargs as well

@janosh
Copy link
Member Author

janosh commented Nov 18, 2024

hmmm... added a lot of test cases. not sure what else to add. could it be the coverage calculation is off?

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janosh just one comment. Happy for you to merge once that's done. I can then push a new version.

@@ -128,6 +128,8 @@ def __init__(
order: JobOrder = JobOrder.AUTO,
uuid: str = None,
hosts: list[str] = None,
metadata: dict[str, Any] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add docstrings for these options in the class docstring? E.g. see the corresponding docs for Job.

@janosh janosh added the ux User experience label Nov 22, 2024
@janosh janosh changed the title Add Flow.metadata attribute used in Flow.update_metadata method Add Flow.metadata attribute and Flow.update_metadata method Nov 22, 2024
@janosh janosh merged commit b4b7dc9 into main Nov 22, 2024
7 of 9 checks passed
@janosh janosh deleted the flow-metadata branch November 22, 2024 14:20
@@ -927,6 +926,7 @@ def update_metadata(
function_filter: Callable = None,
dict_mod: bool = False,
dynamic: bool = True,
callback_filter: Callable[[jobflow.Flow | Job], bool] = lambda _: True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janosh: the Callable callback_filters seem to break some functionality, getting 'dict' object is not callable when these are decoded. Double checked manually that MontyDecoder is not rehydrating these into callables, just as dicts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't consider serialization. Sounds like something we need to add unit tests for. I'm guessing you can't literal_eval a callable safely? in which case not sure there's a solution other than removing the keyword.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you issue an error when someone tries to serialize a callback_filter to explain that this feature is only compatible with create-and-run kind of workflows

Copy link
Member

@utf utf Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialisation shouldn't be a problem if the function is importable, e.g if it is defined in atomate2 or another package. Alternatively we could pickle it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default callback_filter isn't serializing correctly - I'll look into it a bit more and update you both

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialisation shouldn't be a problem if the function is importable, e.g if it is defined in atomate2 or another package.

the intended use case is for callback_filter to be a simple lambda function. if you need complex filters, that's a sign that perhaps you should create a new Flow subclass instead.

Alternatively we could pickle it?

that might work. though we'd have to test cross-platform and cross-Python round-tripping. I've had cases where something pickled on Mac couldn't be unpickled on Linux (and likewise with Python 3.8 vs, say, 3.12) but pickle seems to have gotten less dicey lately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ux User experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants