Skip to content

Commit

Permalink
Merge branch 'upstash-chat-store-integration' of https://github.com/f…
Browse files Browse the repository at this point in the history
…ahreddinozcan/llama_index into upstash-chat-store-integration
  • Loading branch information
fahreddinozcan committed Sep 26, 2024
2 parents 62fe496 + 4156e1f commit c5b1864
Show file tree
Hide file tree
Showing 18 changed files with 348 additions and 87 deletions.
3 changes: 1 addition & 2 deletions docs/docs/examples/agent/nvidia_agent.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -378,8 +378,7 @@
"outputs": [],
"source": [
"response = agent.chat(\n",
" \"Tell me both the risk factors and tailwinds for Uber? Do two parallel tool calls.\",\n",
" allow_parallel_tool_calls=True,\n",
" \"Tell me both the risk factors and tailwinds for Uber? Do two parallel tool calls.\"\n",
")\n",
"print(str(response))"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/examples/workflow/corrective_rag_pack.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"A brief understanding of the paper:\n",
"\n",
"\n",
"Corrective Retrieval Augmented Generation (CRAG) is a method designed to enhance the robustness of language model generation by evaluating and augmenting the relevance of retrieved documents through a an evaluator and large-scale web searches, ensuring more accurate and reliable information is used in generation.\n",
"Corrective Retrieval Augmented Generation (CRAG) is a method designed to enhance the robustness of language model generation by evaluating and augmenting the relevance of retrieved documents through an evaluator and large-scale web searches, ensuring more accurate and reliable information is used in generation.\n",
"\n",
"We use `GPT-4` as a relevancy evaluator and `Tavily AI` for web searches. So, we recommend getting `OPENAI_API_KEY` and `tavily_ai_api_key` before proceeding further."
]
Expand Down
95 changes: 85 additions & 10 deletions docs/docs/module_guides/workflow/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ await w.run(topic="Pirates")
draw_most_recent_execution(w, filename="joke_flow_recent.html")
```

<div id="working-with-global-context-state"></div>
## Working with Global Context/State

Optionally, you can choose to use global context between steps. For example, maybe multiple steps access the original `query` input from the user. You can store this in global context so that every step has access.
Expand Down Expand Up @@ -352,6 +353,59 @@ class RetryOnFridayPolicy:
return None
```

## Human-in-the-loop

Since workflows are so flexible, there are many possible ways to implement human-in-the-loop patterns.

The easiest way to implement a human-in-the-loop is to use the `InputRequiredEvent` and `HumanResponseEvent` events during event streaming.

```python
class HumanInTheLoopWorkflow(Workflow):
@step
async def step1(self, ev: StartEvent) -> InputRequiredEvent:
return InputRequiredEvent(prefix="Enter a number: ")

@step
async def step2(self, ev: HumanResponseEvent) -> StopEvent:
return StopEvent(result=ev.response)


# workflow should work with streaming
workflow = HumanInTheLoopWorkflow()

handler = workflow.run()
async for event in handler.stream_events():
if isinstance(event, InputRequiredEvent):
# here, we can handle human input however you want
# this means using input(), websockets, accessing async state, etc.
# here, we just use input()
response = input(event.prefix)
handler.ctx.send_event(HumanResponseEvent(response=response))

final_result = await handler
```

Here, the workflow will wait until the `HumanResponseEvent` is emitted.

Also note that you can break out of the loop, and resume it later. This is useful if you want to pause the workflow to wait for a human response, but continue the workflow later.

```python
handler = workflow.run()
async for event in handler.stream_events():
if isinstance(event, InputRequiredEvent):
break

# now we handle the human response
response = input(event.prefix)
handler.ctx.send_event(HumanResponseEvent(response=response))

# now we resume the workflow streaming
async for event in handler.stream_events():
continue

final_result = await handler
```

## Stepwise Execution

Workflows have built-in utilities for stepwise execution, allowing you to control execution and debug state as things progress.
Expand Down Expand Up @@ -439,22 +493,43 @@ You can deploy a workflow as a multi-agent service with [llama_deploy](../../mod

## Examples

You can find many useful examples of using workflows in the notebooks below:
To help you become more familiar with the workflow concept and its features, LlamaIndex documentation offers example
notebooks that you can run for hands-on learning:

- [Common Workflow Patterns](../../examples/workflow/workflows_cookbook.ipynb) walks you through common usage patterns
like looping and state management using simple workflows. It's usually a great place to start.
- [RAG + Reranking](../../examples/workflow/rag.ipynb) shows how to implement a real-world use case with a fairly
simple workflow that performs both ingestion and querying.
- [Citation Query Engine](../../examples/workflow/citation_query_engine.ipynb) similar to RAG + Reranking, the
notebook focuses on how to implement intermediate steps in between retrieval and generation. A good example of how to
use the [`Context`](#working-with-global-context-state) object in a workflow.
- [Corrective RAG](../../examples/workflow/corrective_rag_pack.ipynb) adds some more complexity on top of a RAG
workflow, showcasing how to query a web search engine after an evaluation step.
- [Utilizing Concurrency](../../examples/workflow/parallel_execution.ipynb) explains how to manage the parallel
execution of steps in a workflow, something that's important to know as your workflows grow in complexity.

RAG applications are easy to understand and offer a great opportunity to learn the basics of workflows. However, more complex agentic scenarios involving tool calling, memory, and routing are where workflows excel.

The examples below highlight some of these use-cases.

- [ReAct Agent](../../examples/workflow/react_agent.ipynb) is obviously the perfect example to show how to implement
tools in a workflow.
- [Function Calling Agent](../../examples/workflow/function_calling_agent.ipynb) is a great example of how to use the
LlamaIndex framework primitives in a workflow, keeping it small and tidy even in complex scenarios like function
calling.
- [Human In The Loop: Story Crafting](../../examples/workflow/human_in_the_loop_story_crafting.ipynb) is a powerful
example showing how workflow runs can be interactive and stateful. In this case, to collect input from a human.
- [Reliable Structured Generation](../../examples/workflow/reflection.ipynb) shows how to implement loops in a
workflow, in this case to improve structured output through reflection.

Last but not least, a few more advanced use cases that demonstrate how workflows can be extremely handy if you need
to quickly implement prototypes, for example from literature:

- [Advanced Text-to-SQL](../../examples/workflow/advanced_text_to_sql.ipynb)
- [Citation Query Engine](../../examples/workflow/citation_query_engine.ipynb)
- [Common Workflow Patterns](../../examples/workflow/workflows_cookbook.ipynb)
- [Corrective RAG](../../examples/workflow/corrective_rag_pack.ipynb)
- [Function Calling Agent](../../examples/workflow/function_calling_agent.ipynb)
- [Human In The Loop: Story Crafting](../../examples/workflow/human_in_the_loop_story_crafting.ipynb)
- [JSON Query Engine](../../examples/workflow/JSONalyze_query_engine.ipynb)
- [Long RAG](../../examples/workflow/long_rag_pack.ipynb)
- [Multi-Step Query Engine](../../examples/workflow/multi_step_query_engine.ipynb)
- [Multi-Strategy Workflow](../../examples/workflow/multi_strategy_workflow.ipynb)
- [RAG + Reranking](../../examples/workflow/rag.ipynb)
- [ReAct Agent](../../examples/workflow/react_agent.ipynb)
- [Reliable Structured Generation](../../examples/workflow/reflection.ipynb)
- [Router Query Engine](../../examples/workflow/router_query_engine.ipynb)
- [Self Discover Workflow](../../examples/workflow/self_discover_workflow.ipynb)
- [Sub-Question Query Engine](../../examples/workflow/sub_question_query_engine.ipynb)
- [Utilizing Concurrency](../../examples/workflow/parallel_execution.ipynb)
8 changes: 0 additions & 8 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ nav:
- ./examples/agent/react_agent_with_query_engine.ipynb
- ./examples/agent/return_direct_agent.ipynb
- ./examples/agent/structured_planner.ipynb
- ./examples/agents/nvidia_agent.ipynb
- Chat Engines:
- ./examples/chat_engine/chat_engine_best.ipynb
- ./examples/chat_engine/chat_engine_condense_plus_context.ipynb
Expand Down Expand Up @@ -226,7 +225,6 @@ nav:
- ./examples/embeddings/gemini.ipynb
- ./examples/embeddings/gigachat.ipynb
- ./examples/embeddings/google_palm.ipynb
- ./examples/embeddings/gradient.ipynb
- ./examples/embeddings/huggingface.ipynb
- ./examples/embeddings/ibm_watsonx.ipynb
- ./examples/embeddings/ipex_llm.ipynb
Expand Down Expand Up @@ -280,9 +278,6 @@ nav:
- ./examples/finetuning/embeddings/finetune_corpus_embedding.ipynb
- ./examples/finetuning/embeddings/finetune_embedding.ipynb
- ./examples/finetuning/embeddings/finetune_embedding_adapter.ipynb
- ./examples/finetuning/gradient/gradient_fine_tuning.ipynb
- ./examples/finetuning/gradient/gradient_structured.ipynb
- ./examples/finetuning/gradient/gradient_text2sql.ipynb
- ./examples/finetuning/llm_judge/correctness/finetune_llm_judge_single_grading_correctness.ipynb
- ./examples/finetuning/llm_judge/pairwise/finetune_llm_judge.ipynb
- ./examples/finetuning/mistralai_fine_tuning.ipynb
Expand Down Expand Up @@ -319,8 +314,6 @@ nav:
- ./examples/llm/fireworks_cookbook.ipynb
- ./examples/llm/friendli.ipynb
- ./examples/llm/gemini.ipynb
- ./examples/llm/gradient_base_model.ipynb
- ./examples/llm/gradient_model_adapter.ipynb
- ./examples/llm/groq.ipynb
- ./examples/llm/huggingface.ipynb
- ./examples/llm/ibm_watsonx.ipynb
Expand All @@ -340,7 +333,6 @@ nav:
- ./examples/llm/maritalk.ipynb
- ./examples/llm/mistral_rs.ipynb
- ./examples/llm/mistralai.ipynb
- ./examples/llm/mlx.ipynb
- ./examples/llm/modelscope.ipynb
- ./examples/llm/monsterapi.ipynb
- ./examples/llm/mymagic.ipynb
Expand Down
29 changes: 5 additions & 24 deletions llama-index-core/llama_index/core/indices/prompt_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@

import logging
from copy import deepcopy
from string import Formatter
from typing import TYPE_CHECKING, Callable, List, Optional, Sequence

if TYPE_CHECKING:
Expand All @@ -29,6 +28,7 @@
SelectorPromptTemplate,
)
from llama_index.core.prompts.prompt_utils import get_empty_prompt_txt
from llama_index.core.prompts.utils import format_string
from llama_index.core.schema import BaseComponent
from llama_index.core.utilities.token_counting import TokenCounter

Expand Down Expand Up @@ -198,29 +198,10 @@ def _get_available_chunk_size(
for message in messages:
partial_message = deepcopy(message)

# get string variables (if any)
template_vars = [
var
for _, var, _, _ in Formatter().parse(str(message))
if var is not None
]

# figure out which variables are partially formatted
# if a variable is not formatted, it will be replaced with
# the template variable itself
used_vars = {
template_var: f"{{{template_var}}}"
for template_var in template_vars
}
for var_name, val in prompt.kwargs.items():
if var_name in template_vars:
used_vars[var_name] = val

# format partial message
if partial_message.content is not None:
partial_message.content = partial_message.content.format(
**used_vars
)
prompt_kwargs = prompt.kwargs or {}
partial_message.content = format_string(
partial_message.content or "", **prompt_kwargs
)

# add to list of partial messages
partial_messages.append(partial_message)
Expand Down
13 changes: 7 additions & 6 deletions llama-index-core/llama_index/core/memory/chat_memory_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,16 @@ def get(

while token_count > self.token_limit and message_count > 1:
message_count -= 1
if chat_history[-message_count].role == MessageRole.TOOL:
# all tool messages should be preceded by an assistant message
# if we remove a tool message, we need to remove the assistant message too
message_count -= 1

if chat_history[-message_count].role == MessageRole.ASSISTANT:
while chat_history[-message_count].role in (
MessageRole.TOOL,
MessageRole.ASSISTANT,
):
# we cannot have an assistant message at the start of the chat history
# if after removal of the first, we have an assistant message,
# we need to remove the assistant message too
#
# all tool messages should be preceded by an assistant message
# if we remove a tool message, we need to remove the assistant message too
message_count -= 1

cur_messages = chat_history[-message_count:]
Expand Down
14 changes: 14 additions & 0 deletions llama-index-core/llama_index/core/workflow/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,4 +132,18 @@ def __init__(self, result: Any = None) -> None:
super().__init__(result=result)


class InputRequiredEvent(Event):
"""InputRequiredEvent is sent when an input is required for a step."""

prefix: str = Field(
description="The prefix and description of the input that is required."
)


class HumanResponseEvent(Event):
"""HumanResponseEvent is sent when a human response is required for a step."""

response: str = Field(description="The response from the human.")


EventType = Type[Event]
Loading

0 comments on commit c5b1864

Please sign in to comment.