Merge branch 'upstash-chat-store-integration' of https://github.com/f…

…ahreddinozcan/llama_index into upstash-chat-store-integration
run-llama · Sep 26, 2024 · c5b1864 · c5b1864
2 parents 62fe496 + 4156e1f
commit c5b1864
Show file tree

Hide file tree

Showing 18 changed files with 348 additions and 87 deletions.
diff --git a/docs/docs/examples/agent/nvidia_agent.ipynb b/docs/docs/examples/agent/nvidia_agent.ipynb
@@ -378,8 +378,7 @@
    "outputs": [],
    "source": [
     "response = agent.chat(\n",
-    "    \"Tell me both the risk factors and tailwinds for Uber? Do two parallel tool calls.\",\n",
-    "    allow_parallel_tool_calls=True,\n",
+    "    \"Tell me both the risk factors and tailwinds for Uber? Do two parallel tool calls.\"\n",
     ")\n",
     "print(str(response))"
    ]

diff --git a/docs/docs/examples/workflow/corrective_rag_pack.ipynb b/docs/docs/examples/workflow/corrective_rag_pack.ipynb
@@ -11,7 +11,7 @@
     "A brief understanding of the paper:\n",
     "\n",
     "\n",
-    "Corrective Retrieval Augmented Generation (CRAG) is a method designed to enhance the robustness of language model generation by evaluating and augmenting the relevance of retrieved documents through a an evaluator and large-scale web searches, ensuring more accurate and reliable information is used in generation.\n",
+    "Corrective Retrieval Augmented Generation (CRAG) is a method designed to enhance the robustness of language model generation by evaluating and augmenting the relevance of retrieved documents through an evaluator and large-scale web searches, ensuring more accurate and reliable information is used in generation.\n",
     "\n",
     "We use `GPT-4` as a relevancy evaluator and `Tavily AI` for web searches. So, we recommend getting `OPENAI_API_KEY` and `tavily_ai_api_key` before proceeding further."
    ]

diff --git a/docs/docs/module_guides/workflow/index.md b/docs/docs/module_guides/workflow/index.md
@@ -176,6 +176,7 @@ await w.run(topic="Pirates")
 draw_most_recent_execution(w, filename="joke_flow_recent.html")
 ```
 
+<div id="working-with-global-context-state"></div>
 ## Working with Global Context/State
 
 Optionally, you can choose to use global context between steps. For example, maybe multiple steps access the original `query` input from the user. You can store this in global context so that every step has access.
@@ -352,6 +353,59 @@ class RetryOnFridayPolicy:
         return None
 ```
 
+## Human-in-the-loop
+
+Since workflows are so flexible, there are many possible ways to implement human-in-the-loop patterns.
+
+The easiest way to implement a human-in-the-loop is to use the `InputRequiredEvent` and `HumanResponseEvent` events during event streaming.
+
+```python
+class HumanInTheLoopWorkflow(Workflow):
+    @step
+    async def step1(self, ev: StartEvent) -> InputRequiredEvent:
+        return InputRequiredEvent(prefix="Enter a number: ")
+
+    @step
+    async def step2(self, ev: HumanResponseEvent) -> StopEvent:
+        return StopEvent(result=ev.response)
+
+
+# workflow should work with streaming
+workflow = HumanInTheLoopWorkflow()
+
+handler = workflow.run()
+async for event in handler.stream_events():
+    if isinstance(event, InputRequiredEvent):
+        # here, we can handle human input however you want
+        # this means using input(), websockets, accessing async state, etc.
+        # here, we just use input()
+        response = input(event.prefix)
+        handler.ctx.send_event(HumanResponseEvent(response=response))
+
+final_result = await handler
+```
+
+Here, the workflow will wait until the `HumanResponseEvent` is emitted.
+
+Also note that you can break out of the loop, and resume it later. This is useful if you want to pause the workflow to wait for a human response, but continue the workflow later.
+
+```python
+handler = workflow.run()
+async for event in handler.stream_events():
+    if isinstance(event, InputRequiredEvent):
+        break
+
+# now we handle the human response
+response = input(event.prefix)
+handler.ctx.send_event(HumanResponseEvent(response=response))
+
+# now we resume the workflow streaming
+async for event in handler.stream_events():
+    continue
+
+final_result = await handler
+```
+
 ## Stepwise Execution
 
 Workflows have built-in utilities for stepwise execution, allowing you to control execution and debug state as things progress.
@@ -439,22 +493,43 @@ You can deploy a workflow as a multi-agent service with [llama_deploy](../../mod
 
 ## Examples
 
-You can find many useful examples of using workflows in the notebooks below:
+To help you become more familiar with the workflow concept and its features, LlamaIndex documentation offers example
+notebooks that you can run for hands-on learning:
+
+- [Common Workflow Patterns](../../examples/workflow/workflows_cookbook.ipynb) walks you through common usage patterns
+like looping and state management using simple workflows. It's usually a great place to start.
+- [RAG + Reranking](../../examples/workflow/rag.ipynb) shows how to implement a real-world use case with a fairly
+simple workflow that performs both ingestion and querying.
+- [Citation Query Engine](../../examples/workflow/citation_query_engine.ipynb) similar to RAG + Reranking, the
+notebook focuses on how to implement intermediate steps in between retrieval and generation. A good example of how to
+use the [`Context`](#working-with-global-context-state) object in a workflow.
+- [Corrective RAG](../../examples/workflow/corrective_rag_pack.ipynb) adds some more complexity on top of a RAG
+workflow, showcasing how to query a web search engine after an evaluation step.
+- [Utilizing Concurrency](../../examples/workflow/parallel_execution.ipynb) explains how to manage the parallel
+execution of steps in a workflow, something that's important to know as your workflows grow in complexity.
+
+RAG applications are easy to understand and offer a great opportunity to learn the basics of workflows. However, more complex agentic scenarios involving tool calling, memory, and routing are where workflows excel.
+
+The examples below highlight some of these use-cases.
+
+- [ReAct Agent](../../examples/workflow/react_agent.ipynb) is obviously the perfect example to show how to implement
+tools in a workflow.
+- [Function Calling Agent](../../examples/workflow/function_calling_agent.ipynb) is a great example of how to use the
+LlamaIndex framework primitives in a workflow, keeping it small and tidy even in complex scenarios like function
+calling.
+- [Human In The Loop: Story Crafting](../../examples/workflow/human_in_the_loop_story_crafting.ipynb) is a powerful
+example showing how workflow runs can be interactive and stateful. In this case, to collect input from a human.
+- [Reliable Structured Generation](../../examples/workflow/reflection.ipynb) shows how to implement loops in a
+workflow, in this case to improve structured output through reflection.
+
+Last but not least, a few more advanced use cases that demonstrate how workflows can be extremely handy if you need
+to quickly implement prototypes, for example from literature:
 
 - [Advanced Text-to-SQL](../../examples/workflow/advanced_text_to_sql.ipynb)
-- [Citation Query Engine](../../examples/workflow/citation_query_engine.ipynb)
-- [Common Workflow Patterns](../../examples/workflow/workflows_cookbook.ipynb)
-- [Corrective RAG](../../examples/workflow/corrective_rag_pack.ipynb)
-- [Function Calling Agent](../../examples/workflow/function_calling_agent.ipynb)
-- [Human In The Loop: Story Crafting](../../examples/workflow/human_in_the_loop_story_crafting.ipynb)
 - [JSON Query Engine](../../examples/workflow/JSONalyze_query_engine.ipynb)
 - [Long RAG](../../examples/workflow/long_rag_pack.ipynb)
 - [Multi-Step Query Engine](../../examples/workflow/multi_step_query_engine.ipynb)
 - [Multi-Strategy Workflow](../../examples/workflow/multi_strategy_workflow.ipynb)
-- [RAG + Reranking](../../examples/workflow/rag.ipynb)
-- [ReAct Agent](../../examples/workflow/react_agent.ipynb)
-- [Reliable Structured Generation](../../examples/workflow/reflection.ipynb)
 - [Router Query Engine](../../examples/workflow/router_query_engine.ipynb)
 - [Self Discover Workflow](../../examples/workflow/self_discover_workflow.ipynb)
 - [Sub-Question Query Engine](../../examples/workflow/sub_question_query_engine.ipynb)
-- [Utilizing Concurrency](../../examples/workflow/parallel_execution.ipynb)
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -122,7 +122,6 @@ nav:
           - ./examples/agent/react_agent_with_query_engine.ipynb
           - ./examples/agent/return_direct_agent.ipynb
           - ./examples/agent/structured_planner.ipynb
-          - ./examples/agents/nvidia_agent.ipynb
       - Chat Engines:
           - ./examples/chat_engine/chat_engine_best.ipynb
           - ./examples/chat_engine/chat_engine_condense_plus_context.ipynb
@@ -226,7 +225,6 @@ nav:
           - ./examples/embeddings/gemini.ipynb
           - ./examples/embeddings/gigachat.ipynb
           - ./examples/embeddings/google_palm.ipynb
-          - ./examples/embeddings/gradient.ipynb
           - ./examples/embeddings/huggingface.ipynb
           - ./examples/embeddings/ibm_watsonx.ipynb
           - ./examples/embeddings/ipex_llm.ipynb
@@ -280,9 +278,6 @@ nav:
           - ./examples/finetuning/embeddings/finetune_corpus_embedding.ipynb
           - ./examples/finetuning/embeddings/finetune_embedding.ipynb
           - ./examples/finetuning/embeddings/finetune_embedding_adapter.ipynb
-          - ./examples/finetuning/gradient/gradient_fine_tuning.ipynb
-          - ./examples/finetuning/gradient/gradient_structured.ipynb
-          - ./examples/finetuning/gradient/gradient_text2sql.ipynb
           - ./examples/finetuning/llm_judge/correctness/finetune_llm_judge_single_grading_correctness.ipynb
           - ./examples/finetuning/llm_judge/pairwise/finetune_llm_judge.ipynb
           - ./examples/finetuning/mistralai_fine_tuning.ipynb
@@ -319,8 +314,6 @@ nav:
           - ./examples/llm/fireworks_cookbook.ipynb
           - ./examples/llm/friendli.ipynb
           - ./examples/llm/gemini.ipynb
-          - ./examples/llm/gradient_base_model.ipynb
-          - ./examples/llm/gradient_model_adapter.ipynb
           - ./examples/llm/groq.ipynb
           - ./examples/llm/huggingface.ipynb
           - ./examples/llm/ibm_watsonx.ipynb
@@ -340,7 +333,6 @@ nav:
           - ./examples/llm/maritalk.ipynb
           - ./examples/llm/mistral_rs.ipynb
           - ./examples/llm/mistralai.ipynb
-          - ./examples/llm/mlx.ipynb
           - ./examples/llm/modelscope.ipynb
           - ./examples/llm/monsterapi.ipynb
           - ./examples/llm/mymagic.ipynb

diff --git a/llama-index-core/llama_index/core/indices/prompt_helper.py b/llama-index-core/llama_index/core/indices/prompt_helper.py
@@ -10,7 +10,6 @@
 
 import logging
 from copy import deepcopy
-from string import Formatter
 from typing import TYPE_CHECKING, Callable, List, Optional, Sequence
 
 if TYPE_CHECKING:
@@ -29,6 +28,7 @@
     SelectorPromptTemplate,
 )
 from llama_index.core.prompts.prompt_utils import get_empty_prompt_txt
+from llama_index.core.prompts.utils import format_string
 from llama_index.core.schema import BaseComponent
 from llama_index.core.utilities.token_counting import TokenCounter
 
@@ -198,29 +198,10 @@ def _get_available_chunk_size(
             for message in messages:
                 partial_message = deepcopy(message)
 
-                # get string variables (if any)
-                template_vars = [
-                    var
-                    for _, var, _, _ in Formatter().parse(str(message))
-                    if var is not None
-                ]
-
-                # figure out which variables are partially formatted
-                # if a variable is not formatted, it will be replaced with
-                # the template variable itself
-                used_vars = {
-                    template_var: f"{{{template_var}}}"
-                    for template_var in template_vars
-                }
-                for var_name, val in prompt.kwargs.items():
-                    if var_name in template_vars:
-                        used_vars[var_name] = val
-
-                # format partial message
-                if partial_message.content is not None:
-                    partial_message.content = partial_message.content.format(
-                        **used_vars
-                    )
+                prompt_kwargs = prompt.kwargs or {}
+                partial_message.content = format_string(
+                    partial_message.content or "", **prompt_kwargs
+                )
 
                 # add to list of partial messages
                 partial_messages.append(partial_message)

diff --git a/llama-index-core/llama_index/core/memory/chat_memory_buffer.py b/llama-index-core/llama_index/core/memory/chat_memory_buffer.py
@@ -123,15 +123,16 @@ def get(
 
         while token_count > self.token_limit and message_count > 1:
             message_count -= 1
-            if chat_history[-message_count].role == MessageRole.TOOL:
-                # all tool messages should be preceded by an assistant message
-                # if we remove a tool message, we need to remove the assistant message too
-                message_count -= 1
-
-            if chat_history[-message_count].role == MessageRole.ASSISTANT:
+            while chat_history[-message_count].role in (
+                MessageRole.TOOL,
+                MessageRole.ASSISTANT,
+            ):
                 # we cannot have an assistant message at the start of the chat history
                 # if after removal of the first, we have an assistant message,
                 # we need to remove the assistant message too
+                #
+                # all tool messages should be preceded by an assistant message
+                # if we remove a tool message, we need to remove the assistant message too
                 message_count -= 1
 
             cur_messages = chat_history[-message_count:]

diff --git a/llama-index-core/llama_index/core/workflow/events.py b/llama-index-core/llama_index/core/workflow/events.py
@@ -132,4 +132,18 @@ def __init__(self, result: Any = None) -> None:
         super().__init__(result=result)
 
 
+class InputRequiredEvent(Event):
+    """InputRequiredEvent is sent when an input is required for a step."""
+
+    prefix: str = Field(
+        description="The prefix and description of the input that is required."
+    )
+
+
+class HumanResponseEvent(Event):
+    """HumanResponseEvent is sent when a human response is required for a step."""
+
+    response: str = Field(description="The response from the human.")
+
+
 EventType = Type[Event]