-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstruct flow from outputs in JobStore [WIP] #425
base: main
Are you sure you want to change the base?
Reconstruct flow from outputs in JobStore [WIP] #425
Conversation
…re/flow-rehydration
…re/flow-rehydration
I think there are a lot of possible interaction patterns people will use for retrieving and navigating flow outputs, so I just sketched some out here. I'd like to get some feed back from @utf and @arosen93 to see if any of this looks good. I'm happy to rework/remove any of this per consensus here, but just wanted to throw something out there for a first pass. |
The interface of |
Hi @mcgalcode , Very nice that you start this WIP PR and great start! I did not get into all the details, but from what I understand of the implementation, the Flow can be reconstructed if its jobs have finished (as job output documents are inserted into the database only when the complete). Do I understand this correctly? I guess it would be nice to reconstruct a Flow even when it hasn't started or while some of its jobs have completed and some others are running (or waiting), or be able to reconstruct a flow with a job that has failed (the failed job won't appear in the reconstructed Flow I think, as there won't be any job document to it). Any ideas ? |
Hi @davidwaroquiers ! Sorry for the delayed response here, I think I missed turning on notifications for this one. You understand correctly. This code reconstructs a flow from output objects that are present in the main output document store. I like the idea of being able to reconstruct flows that have yet to be started, or flows that are incomplete or partially failed, but I'm a little hazy on how that would work. For instance, if a flow hasn't started yet (i.e., it hasn't been run by some type of I can imagine doing this in the case of a particular manager, i.e. the fireworks manager. In that case, I think utilities for reconstructing a Flow from it's representation in fireworks would be particularly useful, and I actually have some sketchy helper functions that I use for this in my own work. Is that what you mean? I think my jobflow understanding may be a bit insufficient here :) |
No you are perfectly right. I am actually always thinking in terms of an ongoing development you are not yet aware of (normal it's in private repos). I will add you to the repos (it's a remote execution mode of jobflow, which ultimately might be included directly into jobflow itself). If you have questions about it, feel free to contact me and I can give you a few more details. |
src/jobflow/schemas/job_store.py
Outdated
None, | ||
description="The index of the job (number of times the job has been replaced.", | ||
) | ||
output: typing.Any = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of typing.Any
, this could be made a Generic:
T = TypeVar("T")
class JobStoreDoc(BaseModel, Generic[T]):
...
output: T
This would allow people to easily sub-class when they're working with a JobStore document with a specific output type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this note, I had a similar document model implemented that I had intended to upstream. One addition I found helpful was:
@validator("output", pre=True)
def reserialize_output(cls, v):
if isinstance(v, dict) and "@module" in v and "@class" in v:
v = MontyDecoder().process_decoded(v)
return v
Note this is in pydantic v1 syntax, I think it will have to be modified slightly for pydantic v2.
This is useful in the scenario that you're loading a dict directly, e.g. JobFlowDocument(**dict)
, rather than using MontyDecoder which I think would take care of it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt! These are good suggestions. We have another PR that this one depends on that isolates the implementation of that model right here: #424. This one is a clunky use of git.
@hrushikesh-s Do you want to update your PR with these suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkhorton thanks for the suggestions.
@mcgalcode , yes I'll update my PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to migrate Jobflow to pydantic2 very very soon. I am waiting on Jason to patch emmet and maggma first, which he said should be done in a day or two.
So, perhaps on Friday? If not, I'm taking PTO for two weeks so it'd have to wait until after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Andrew-S-Rosen thanks for the update. For the time being, I've implemented the validator
as per pydantic v1
syntax in #424, but changing it to pydantic v2
should be straightforward. I'll modify it once your migration from pydantic v1
to v2
is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Thanks. Will be good to keep tabs on what needs switching and where.
@davidwaroquiers Aha! Thanks for this clarification, I'll take a look at the repos you added me to. Sounds like it will inspire your suggestion :) I do think that the functionality you're talking about would be very useful. It can be a little confusing to interact with job outputs since there is no formal record of whatever flow they belong to. |
Summary
This is a WIP PR addressing #374 . I implemented the storage of input references in the JobStoreDocument (from Hrushikesh's PR (here).
Checklist
Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request
title.
Before a pull request can be merged, the following items must be checked:
The easiest way to handle this is to run the following in the correct sequence on
your local machine. Start with running black on your new code. This will
automatically reformat your code to PEP8 conventions and removes most issues. Then run
pycodestyle, followed by flake8.
Run pydocstyle on your code.
type check your code.
Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is highly
recommended that you use the pre-commit hook provided in the repository. Simply
cp pre-commit .git/hooks
and a check will be run prior to allowing commits.