Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiAgentWorkflow #17237

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
ff9afa6
initial working e2e
logan-markewich Dec 10, 2024
3f2574f
e2e
logan-markewich Dec 18, 2024
c992dd9
move file
logan-markewich Dec 19, 2024
969b64a
remove human confirmation concept, add wait_for_event
logan-markewich Dec 24, 2024
ae072b8
add tests
logan-markewich Dec 26, 2024
be035d0
wip
logan-markewich Dec 27, 2024
6974a9d
refactor
logan-markewich Dec 29, 2024
d91f93e
refactor
logan-markewich Dec 29, 2024
85b89af
finish refactor
logan-markewich Dec 29, 2024
faabe0d
add docs
logan-markewich Dec 29, 2024
e9fde68
nit
logan-markewich Dec 29, 2024
81efcec
add build file
logan-markewich Dec 29, 2024
2d47771
remove unused event
logan-markewich Dec 30, 2024
ce758fb
Merge branch 'main' into logan/multi_agent
logan-markewich Dec 31, 2024
4718dce
refactor
logan-markewich Jan 2, 2025
f2f2a1e
fix tests
logan-markewich Jan 2, 2025
9151673
clean up types
logan-markewich Jan 2, 2025
935e4ae
add more tests
logan-markewich Jan 2, 2025
f527a05
fix small bug in finalize for react agent
logan-markewich Jan 2, 2025
d94d22f
make react components configurable
logan-markewich Jan 2, 2025
347bda1
make function agent use scratchpad
logan-markewich Jan 2, 2025
aa555a2
update docs
logan-markewich Jan 2, 2025
ee70508
add to nav
logan-markewich Jan 2, 2025
0027c3a
Add a timeout when waiting for event
logan-markewich Jan 5, 2025
9a89262
Merge branch 'main' into logan/multi_agent
logan-markewich Jan 10, 2025
c63849f
avoid duplicate tools
logan-markewich Jan 10, 2025
96e458c
remove breakpoint
logan-markewich Jan 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/docs/api_reference/agent/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: llama_index.core.agent.workflow
options:
members:
- MultiAgentWorkflow
- BaseWorkflowAgent
- FunctionAgent
- ReactAgent
- AgentInput
- AgentStream
- AgentOutput
- ToolCall
- ToolCallResult
257 changes: 257 additions & 0 deletions docs/docs/understanding/agent/multi_agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# Multi-Agent Workflows

The MultiAgentWorkflow uses Workflow Agents to allow you to create a system of multiple agents that can collaborate and hand off tasks to each other based on their specialized capabilities. This enables building more complex agent systems where different agents handle different aspects of a task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention that this is built upon our core workflows classes?

the user will at least know that the syntax for running workflows and handling the output event stream is the same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I should at least link to it somewhere, good point


## Quick Start

Here's a simple example of setting up a multi-agent workflow with a calculator agent and a retriever agent:

```python
from llama_index.core.agent.workflow import (
MultiAgentWorkflow,
FunctionAgent,
ReactAgent,
)
from llama_index.core.tools import FunctionTool


# Define some tools
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b


def subtract(a: int, b: int) -> int:
"""Subtract two numbers."""
return a - b


# Create agent configs
# NOTE: we can use FunctionAgent or ReactAgent here.
# FunctionAgent works for LLMs with a function calling API.
# ReactAgent works for any LLM.
calculator_agent = FunctionAgent(
name="calculator",
description="Performs basic arithmetic operations",
system_prompt="You are a calculator assistant.",
tools=[
FunctionTool.from_defaults(fn=add),
FunctionTool.from_defaults(fn=subtract),
],
Comment on lines +37 to +40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible in Python to wrap FunctionTool automatically, e.g.?

Suggested change
tools=[
FunctionTool.from_defaults(fn=add),
FunctionTool.from_defaults(fn=subtract),
],
tools=[
add, subtract,
],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be possible, although now with FunctionTool and FunctionToolWithContext, I'll need to think a little harder about how to detect when each one is needed

llm=OpenAI(model="gpt-4"),
)

retriever_agent = FunctionAgent(
name="retriever",
description="Manages data retrieval",
system_prompt="You are a retrieval assistant.",
is_entrypoint_agent=True,
llm=OpenAI(model="gpt-4"),
)

# Create and run the workflow
workflow = MultiAgentWorkflow(
agent_configs=[calculator_agent, retriever_agent]
Comment on lines +53 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about defining the entrypoint here?

Suggested change
workflow = MultiAgentWorkflow(
agent_configs=[calculator_agent, retriever_agent]
workflow = MultiAgentWorkflow(
entrypoint=[retriever_agent]
agent_configs=[calculator_agent, retriever_agent]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yea, that is actually a better UX (although I think there can only be one entry point 🤔)

)

# Run the system
response = await workflow.run(user_msg="Can you add 5 and 3?")

# Or stream the events
handler = workflow.run(user_msg="Can you add 5 and 3?")
async for event in handler.stream_events():
if hasattr(event, "delta"):
print(event.delta, end="", flush=True)
Comment on lines +62 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these agents streaming dedicated events that can be shown in the UI (we're using this in create-llama).

There we're having the AgentRunEvent see
https://github.com/run-llama/create-llama/blob/main/templates/components/multiagent/python/app/workflows/events.py
(ignore to_response - this is for conversion to vercel data streams, this concern can be done outside of this PR)

here is an example using it to send the progress of tool calls:
https://github.com/run-llama/create-llama/blob/eec237c5feea1af9cdd5b276d34ebe3b8d0fd185/templates/components/multiagent/python/app/workflows/tools.py#L141

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That is the main intention for these events, to show progress in some UI.

I didn't capture the concept of "in progress" or "completed" with this, its mostly all just events at points in time (here's the agent input, here's the agent stream, here's the agent output, heres a tool im about to call, here's the tool output) -- I could refactor, but not sure if its needed or not

Comment on lines +62 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could then also add a helper that is printing the events in a nice way for examples:

Suggested change
async for event in handler.stream_events():
if hasattr(event, "delta"):
print(event.delta, end="", flush=True)
async for event in handler.stream_events():
print_event(event)

```

## How It Works

The MultiAgentWorkflow manages a collection of agents, each with their own specialized capabilities. One agent must be designated as the entry point agent (`is_entrypoint_agent=True`).

When a user message comes in, it's first routed to the entry point agent. Each agent can then:

1. Handle the request directly using their tools
2. Hand off to another agent better suited for the task
3. Return a response to the user

## Configuration Options

### Agent Config

Each agent holds a certain set of configuration options. Whether you use `FunctionAgent` or `ReactAgent`, the core options are the same.

```python
FunctionAgent(
# Unique name for the agent (str)
name="name",
# Description of agent's capabilities (str)
description="description",
# System prompt for the agent (str)
system_prompt="system_prompt",
# Tools available to this agent (List[BaseTool])
tools=[...],
# LLM to use for this agent. (BaseLLM)
llm=OpenAI(model="gpt-4"),
# Whether this is the entry point. (bool)
is_entrypoint_agent=True,
# List of agents this one can hand off to. Defaults to all agents. (List[str])
can_handoff_to=[...],
)
```

### Workflow Options

The MultiAgentWorkflow constructor accepts:

```python
MultiAgentWorkflow(
# List of agent configs. (List[BaseWorkflowAgent])
agents=[...],
# Initial state dict. (Optional[dict])
initial_state=None,
# Custom prompt for handoffs. Should contain the `agent_info` string variable. (Optional[str])
handoff_prompt=None,
# Custom prompt for state. Should contain the `state` and `msg` string variables. (Optional[str])
state_prompt=None,
)
```

### State Management

#### Initial Global State

You can provide an initial state dict that will be available to all agents:

```python
workflow = MultiAgentWorkflow(
agents=[...],
initial_state={"counter": 0},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc how is the state modified? by the user or by the agent?

are there constraints in 1) number of keys, and 2) the types of the values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the state is thrown into the workflow context, so FunctionToolWithContext can access it and modify it. The constraints are the same as a normal workflow context imo -- if its not serializable, you might have issues in certain runtimes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see

state_prompt="Current state: {state}. User message: {msg}",
)
```

The state is stored in the `state` key of the workflow context.

#### Persisting State Between Runs

In order to persist state between runs, you can pass in the context from the previous run:

```python
workflow = MultiAgentWorkflow(...)

# Run the workflow
handler = workflow.run(user_msg="Can you add 5 and 3?")
response = await handler

# Pass in the context from the previous run
handler = workflow.run(ctx=handler.ctx, user_msg="Can you add 5 and 3?")
response = await handler
```

#### Serializing Context / State

As with normal workflows, the context is serializable:

```python
from llama_index.core.workflow import (
Context,
JsonSerializer,
JsonPickleSerializer,
)

# the default serializer is JsonSerializer for safety
ctx_dict = handler.ctx.to_dict(serializer=JsonSerializer())

# then you can rehydrate the context
ctx = Context.from_dict(ctx_dict, serializer=JsonSerializer())
```

## Streaming Events

The workflow emits various events during execution that you can stream:

```python
async for event in workflow.run(...).stream_events():
if isinstance(event, AgentInput):
print(event.input)
print(event.current_agent_name)
elif isinstance(event, AgentStream):
# Agent thinking/tool calling response stream
print(event.delta)
print(event.current_agent_name)
elif isinstance(event, AgentOutput):
print(event.response)
print(event.tool_calls)
print(event.raw)
print(event.current_agent_name)
elif isinstance(event, ToolCall):
# Tool being called
print(event.tool_name)
print(event.tool_kwargs)
elif isinstance(event, ToolCallResult):
# Result of tool call
print(event.tool_output)
```

## Accessing Context in Tools

The `FunctionToolWithContext` allows tools to access the workflow context:

```python
from llama_index.core.workflow import FunctionToolWithContext


async def get_counter(ctx: Context) -> int:
"""Get the current counter value."""
return await ctx.get("counter", default=0)


counter_tool = FunctionToolWithContext.from_defaults(
async_fn=get_counter, description="Get the current counter value"
)
```

## Human in the Loop

Using the context, you can implement a human in the loop pattern in your tools:

```python
from llama_index.core.workflow import Event


class AskForConfirmationEvent(Event):
"""Ask for confirmation event."""

confirmation_id: str


class ConfirmationEvent(Event):
"""Confirmation event."""

confirmation: bool
confirmation_id: str


async def ask_for_confirmation(ctx: Context) -> bool:
"""Ask the user for confirmation."""
ctx.write_event_to_stream(AskForConfirmationEvent(confirmation_id="1234"))

result = await ctx.wait_for_event(
ConfirmationEvent, requirements={"confirmation_id": "1234"}
)
return result.confirmation
```

When this function is called, it will block the workflow execution until the user sends the required confirmation event.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this function called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By an agent, its meant to be an agent tool. I'll make this clearer.

You could also subclass or use this in your own workflows though


```python
handler = workflow.run(user_msg="Can you add 5 and 3?")

async for event in handler.stream_events():
if isinstance(event, AskForConfirmationEvent):
print(event.confirmation_id)
handler.ctx.send_event(
ConfirmationEvent(confirmation=True, confirmation_id="1234")
)
...
```
2 changes: 2 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ nav:
- Enhancing with LlamaParse: ./understanding/agent/llamaparse.md
- Memory: ./understanding/agent/memory.md
- Adding other tools: ./understanding/agent/tools.md
- Multi-agent workflows: ./understanding/agent/multi_agents.md
- Building Workflows:
- Introduction to workflows: ./understanding/workflows/index.md
- A basic workflow: ./understanding/workflows/basic_flow.md
Expand Down Expand Up @@ -852,6 +853,7 @@ nav:
- ./api_reference/agent/openai.md
- ./api_reference/agent/openai_legacy.md
- ./api_reference/agent/react.md
- ./api_reference/agent/workflow.md
- Callbacks:
- ./api_reference/callbacks/agentops.md
- ./api_reference/callbacks/aim.md
Expand Down
1 change: 1 addition & 0 deletions llama-index-core/llama_index/core/agent/workflow/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python_sources()
26 changes: 26 additions & 0 deletions llama-index-core/llama_index/core/agent/workflow/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from llama_index.core.agent.workflow.multi_agent_workflow import MultiAgentWorkflow
from llama_index.core.agent.workflow.base_agent import BaseWorkflowAgent
from llama_index.core.agent.workflow.function_agent import FunctionAgent
from llama_index.core.agent.workflow.react_agent import ReactAgent
from llama_index.core.agent.workflow.workflow_events import (
AgentInput,
AgentSetup,
AgentStream,
AgentOutput,
ToolCall,
ToolCallResult,
)


__all__ = [
"AgentInput",
"AgentSetup",
"AgentStream",
"AgentOutput",
"BaseWorkflowAgent",
"FunctionAgent",
"MultiAgentWorkflow",
"ReactAgent",
"ToolCall",
"ToolCallResult",
]
71 changes: 71 additions & 0 deletions llama-index-core/llama_index/core/agent/workflow/base_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
from abc import ABC, abstractmethod
from typing import List, Optional

from llama_index.core.agent.workflow.workflow_events import (
AgentOutput,
ToolCallResult,
)
from llama_index.core.bridge.pydantic import BaseModel, Field, ConfigDict
from llama_index.core.llms import ChatMessage, LLM
from llama_index.core.memory import BaseMemory
from llama_index.core.tools import BaseTool, AsyncBaseTool
from llama_index.core.workflow import Context
from llama_index.core.objects import ObjectRetriever
from llama_index.core.settings import Settings


def get_default_llm() -> LLM:
return Settings.llm


class BaseWorkflowAgent(BaseModel, ABC):
logan-markewich marked this conversation as resolved.
Show resolved Hide resolved
"""Base class for all agents, combining config and logic."""

model_config = ConfigDict(arbitrary_types_allowed=True)

name: str = Field(description="The name of the agent")
description: str = Field(
description="The description of what the agent does and is responsible for"
)
system_prompt: Optional[str] = Field(
default=None, description="The system prompt for the agent"
)
tools: Optional[List[BaseTool]] = Field(
default=None, description="The tools that the agent can use"
)
tool_retriever: Optional[ObjectRetriever] = Field(
default=None,
description="The tool retriever for the agent, can be provided instead of tools",
)
can_handoff_to: Optional[List[str]] = Field(
default=None, description="The agent names that this agent can hand off to"
)
llm: LLM = Field(
default_factory=get_default_llm, description="The LLM that the agent uses"
)
is_entrypoint_agent: bool = Field(
default=False,
description="Whether the agent is the entrypoint agent in a multi-agent workflow",
)

@abstractmethod
async def take_step(
self,
ctx: Context,
llm_input: List[ChatMessage],
tools: List[AsyncBaseTool],
memory: BaseMemory,
) -> AgentOutput:
"""Take a single step with the agent."""

@abstractmethod
async def handle_tool_call_results(
self, ctx: Context, results: List[ToolCallResult], memory: BaseMemory
) -> None:
"""Handle tool call results."""

@abstractmethod
async def finalize(
self, ctx: Context, output: AgentOutput, memory: BaseMemory
) -> AgentOutput:
"""Finalize the agent's execution."""
Loading
Loading