Reconstruct flow from outputs in JobStore [WIP] #425

mcgalcode · 2023-09-09T00:12:51Z

Summary

This is a WIP PR addressing #374 . I implemented the storage of input references in the JobStoreDocument (from Hrushikesh's PR (here).

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request
title.

Before a pull request can be merged, the following items must be checked:

Code is in the standard Python style.
The easiest way to handle this is to run the following in the correct sequence on
your local machine. Start with running black on your new code. This will
automatically reformat your code to PEP8 conventions and removes most issues. Then run
pycodestyle, followed by flake8.
Docstrings have been added in theNumpy docstring format.
Run pydocstyle on your code.
Type annotations are highly encouraged. Run mypy to
type check your code.
Tests have been added for any new functionality or bug fixes.
All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is highly
recommended that you use the pre-commit hook provided in the repository. Simply
cp pre-commit .git/hooks and a check will be run prior to allowing commits.

…re/flow-rehydration

mcgalcode · 2023-09-09T00:13:28Z

I think there are a lot of possible interaction patterns people will use for retrieving and navigating flow outputs, so I just sketched some out here. I'd like to get some feed back from @utf and @arosen93 to see if any of this looks good.

I'm happy to rework/remove any of this per consensus here, but just wanted to throw something out there for a first pass.

mcgalcode · 2023-09-09T00:14:38Z

The interface of FlowOutput (here) sort of sketches out the interaction patterns I was imagining.

davidwaroquiers · 2023-09-14T11:59:20Z

Hi @mcgalcode ,

Very nice that you start this WIP PR and great start! I did not get into all the details, but from what I understand of the implementation, the Flow can be reconstructed if its jobs have finished (as job output documents are inserted into the database only when the complete). Do I understand this correctly? I guess it would be nice to reconstruct a Flow even when it hasn't started or while some of its jobs have completed and some others are running (or waiting), or be able to reconstruct a flow with a job that has failed (the failed job won't appear in the reconstructed Flow I think, as there won't be any job document to it).

Any ideas ?

mcgalcode · 2023-09-19T20:53:38Z

Hi @davidwaroquiers ! Sorry for the delayed response here, I think I missed turning on notifications for this one.

You understand correctly. This code reconstructs a flow from output objects that are present in the main output document store.

I like the idea of being able to reconstruct flows that have yet to be started, or flows that are incomplete or partially failed, but I'm a little hazy on how that would work. For instance, if a flow hasn't started yet (i.e., it hasn't been run by some type of manager), I thought it doesn't exist anywhere aside from the memory of the program that instantiated it. Is that not the case?

I can imagine doing this in the case of a particular manager, i.e. the fireworks manager. In that case, I think utilities for reconstructing a Flow from it's representation in fireworks would be particularly useful, and I actually have some sketchy helper functions that I use for this in my own work. Is that what you mean?

I think my jobflow understanding may be a bit insufficient here :)

davidwaroquiers · 2023-09-20T10:14:26Z

No you are perfectly right. I am actually always thinking in terms of an ongoing development you are not yet aware of (normal it's in private repos). I will add you to the repos (it's a remote execution mode of jobflow, which ultimately might be included directly into jobflow itself). If you have questions about it, feel free to contact me and I can give you a few more details.

mkhorton · 2023-09-20T20:45:31Z

src/jobflow/schemas/job_store.py

+        None,
+        description="The index of the job (number of times the job has been replaced.",
+    )
+    output: typing.Any = Field(


Instead of typing.Any, this could be made a Generic:

T = TypeVar("T") class JobStoreDoc(BaseModel, Generic[T]): ... output: T

This would allow people to easily sub-class when they're working with a JobStore document with a specific output type.

On this note, I had a similar document model implemented that I had intended to upstream. One addition I found helpful was:

@validator("output", pre=True) def reserialize_output(cls, v): if isinstance(v, dict) and "@module" in v and "@class" in v: v = MontyDecoder().process_decoded(v) return v

Note this is in pydantic v1 syntax, I think it will have to be modified slightly for pydantic v2.

This is useful in the scenario that you're loading a dict directly, e.g. JobFlowDocument(**dict), rather than using MontyDecoder which I think would take care of it automatically.

Thanks Matt! These are good suggestions. We have another PR that this one depends on that isolates the implementation of that model right here: #424. This one is a clunky use of git.

@hrushikesh-s Do you want to update your PR with these suggestions?

@mkhorton thanks for the suggestions.
@mcgalcode , yes I'll update my PR accordingly.

@mkhorton: Jobflow uses Pydantic v1 (see here). Is there any plan to upgrade it to v2 anytime soon? If not then your implementation of @validator("output", pre=True) should work as it is, and we don't need to change the syntax.
Maybe I'm missing out on something there?

I plan to migrate Jobflow to pydantic2 very very soon. I am waiting on Jason to patch emmet and maggma first, which he said should be done in a day or two.

So, perhaps on Friday? If not, I'm taking PTO for two weeks so it'd have to wait until after.

@Andrew-S-Rosen thanks for the update. For the time being, I've implemented the validator as per pydantic v1 syntax in #424, but changing it to pydantic v2 should be straightforward. I'll modify it once your migration from pydantic v1 to v2 is complete.

Sounds good! Thanks. Will be good to keep tabs on what needs switching and where.

mcgalcode · 2023-09-21T00:26:22Z

@davidwaroquiers Aha! Thanks for this clarification, I'll take a look at the repos you added me to. Sounds like it will inspire your suggestion :)

I do think that the functionality you're talking about would be very useful. It can be a little confusing to interact with job outputs since there is no formal record of whatever flow they belong to.

mcgalcode added 8 commits August 31, 2023 15:34

Store input references in job output document

086692a

Add OutputManager class for reconstructing flows

b0dc3d0

Add store autoloading

25d7df1

Merge branch 'pydantic' of github.com:hrushikesh-s/jobflow into featu…

7f50f28

…re/flow-rehydration

Add input_references to JobStoreDoc

e6cb239

Merge branch 'pydantic' of github.com:hrushikesh-s/jobflow into featu…

a3f4002

…re/flow-rehydration

Move import to fix test

3f48677

Add output retrieval functions

ace157c

mkhorton reviewed Sep 20, 2023

View reviewed changes

hrushikesh-s mentioned this pull request Sep 21, 2023

Formalizing the JobStore document format as a pydantic model #424

Merged

5 tasks

utf mentioned this pull request Aug 6, 2024

where to find inputs of a job #663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct flow from outputs in JobStore [WIP] #425

Reconstruct flow from outputs in JobStore [WIP] #425

mcgalcode commented Sep 9, 2023

mcgalcode commented Sep 9, 2023

mcgalcode commented Sep 9, 2023

davidwaroquiers commented Sep 14, 2023

mcgalcode commented Sep 19, 2023

davidwaroquiers commented Sep 20, 2023

mkhorton Sep 20, 2023

mkhorton Sep 20, 2023

mcgalcode Sep 21, 2023

hrushikesh-s Sep 21, 2023

hrushikesh-s Sep 21, 2023

Andrew-S-Rosen Sep 21, 2023 •

edited

Loading

hrushikesh-s Sep 21, 2023

Andrew-S-Rosen Sep 21, 2023

mcgalcode commented Sep 21, 2023

Reconstruct flow from outputs in JobStore [WIP] #425

Are you sure you want to change the base?

Reconstruct flow from outputs in JobStore [WIP] #425

Conversation

mcgalcode commented Sep 9, 2023

Summary

Checklist

mcgalcode commented Sep 9, 2023

mcgalcode commented Sep 9, 2023

davidwaroquiers commented Sep 14, 2023

mcgalcode commented Sep 19, 2023

davidwaroquiers commented Sep 20, 2023

mkhorton Sep 20, 2023

Choose a reason for hiding this comment

mkhorton Sep 20, 2023

Choose a reason for hiding this comment

mcgalcode Sep 21, 2023

Choose a reason for hiding this comment

hrushikesh-s Sep 21, 2023

Choose a reason for hiding this comment

hrushikesh-s Sep 21, 2023

Choose a reason for hiding this comment

Andrew-S-Rosen Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

hrushikesh-s Sep 21, 2023

Choose a reason for hiding this comment

Andrew-S-Rosen Sep 21, 2023

Choose a reason for hiding this comment

mcgalcode commented Sep 21, 2023

Andrew-S-Rosen Sep 21, 2023 •

edited

Loading