Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable context serialization in workflows #16250

Merged
merged 13 commits into from
Oct 2, 2024

Conversation

logan-markewich
Copy link
Collaborator

@logan-markewich logan-markewich commented Sep 27, 2024

This is an initial stab at serializing the context in workflows

wf = Workflow(allow_pickle=True)

handler = wf.run()
_ = await handler

state_dict = get_state_dict(handler)
new_handler = get_handler_from_state_dict(wf, state_dict)

This would allow for a few use-cases (which both a prime candidates for usage in llama-deploy, and other deployment scenarios)

  • storing the context between runs
  • stoping and resuming a workflow mid-run (probably during stepwise execution)

Some considerations on this

  • the serialization logic could be its own module, to allow users to customize it. I'm not sure if we want to do that or not though
  • I'm not entirely convinced where this API should live -- right now its a util function hiding the actual operations on the context
  • maybe the pickling fallback should be opt-in?

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 27, 2024
@logan-markewich logan-markewich changed the title [WIP] Enable context serialization in workflows Enable context serialization in workflows Sep 30, 2024
llama-index-core/llama_index/core/workflow/handler.py Outdated Show resolved Hide resolved
llama-index-core/llama_index/core/workflow/context.py Outdated Show resolved Hide resolved
llama-index-core/llama_index/core/workflow/context.py Outdated Show resolved Hide resolved
@@ -40,6 +83,83 @@ def __init__(self, workflow: "Workflow", stepwise: bool = False) -> None:
# Step-specific instance
self._events_buffer: Dict[Type[Event], List[Event]] = defaultdict(list)

# keep track of all the event classes that are accepted by the workflow
self._event_classes: Dict[str, Type[Event]] = {}
for step_func in self._workflow._get_steps().values():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of keeping a list of class objects, we could serialize a class with its qualified name (see https://github.com/deepset-ai/haystack/blob/main/haystack/core/serialization.py#L74) that later we can pass to importlib with something like https://github.com/deepset-ai/haystack/blob/main/haystack/core/serialization.py#L195

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, will take a look!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an approach working with this... slightly concerned about security issues, but I suppose if you are importing something already on your machine, and thats an issue for you, you have other problems 😅

llama-index-core/llama_index/core/workflow/context.py Outdated Show resolved Hide resolved
@logan-markewich
Copy link
Collaborator Author

logan-markewich commented Oct 1, 2024

Ok, latest pushes address Messi's suggestions

  • make serializers multiple classes
  • don't attach serializers to class instances
  • don't hide the serialization machinery too much

I also added one unit test after merging in the latest changes to the stepwise api, but its broken (spooky CICD is caching stuff I guess?) -- I really want to serialize a run mid-way through, but so far, its not working 🤔 Although I haven't been able to track down why yet

@nerdai
Copy link
Contributor

nerdai commented Oct 2, 2024

Looks like all check passed? Guess it wasn't any of the checked in unit tests that were failing? @logan-markewich

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! A couple minor nits.

module_class = import_module_from_qualified_name(data["qualified_name"])
return module_class.from_dict(data["value"])
except Exception as e:
breakpoint()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need breakpoint here. Do we want to raise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh lol leftover debugging, whoops

raise ValueError(f"Failed to deserialize value for key {key}: {e}")
return deserialized_globals

def to_dict(self, serializer: Optional[BaseSerializer] = None) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: wondering if it would be useful to make this into at least a TypedDict?

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! A couple minor nits.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 2, 2024
@logan-markewich logan-markewich merged commit ba2cc90 into main Oct 2, 2024
10 checks passed
@logan-markewich logan-markewich deleted the logan/workflow_serialize branch October 2, 2024 17:10
raspawar pushed a commit to raspawar/llama_index that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants