Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge #1567

Merged
merged 49 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
75322b2
initial knowledge
joaomdmoura Nov 4, 2024
dc314c1
Merge branch 'main' into knowledge
bhancockio Nov 4, 2024
a8a2f80
WIP
bhancockio Nov 5, 2024
1a35114
Adding core knowledge sources
bhancockio Nov 6, 2024
6131dba
Improve types and better support for file paths
bhancockio Nov 6, 2024
617ee98
added additional sources
bhancockio Nov 6, 2024
4af263c
Merge branch 'main' into knowledge
bhancockio Nov 7, 2024
59165cb
fix linting
bhancockio Nov 7, 2024
86ede83
update yaml to include optional deps
bhancockio Nov 7, 2024
7b59c5b
adding in lorenze feedback
bhancockio Nov 7, 2024
98a708c
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 14, 2024
10f445e
ensure embeddings are persisted
lorenzejay Nov 15, 2024
cb03ee6
improvements all around Knowledge class
lorenzejay Nov 15, 2024
cdf5233
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 15, 2024
b907938
return this
lorenzejay Nov 15, 2024
352d053
properly reset memory
lorenzejay Nov 18, 2024
b2c06d5
properly reset memory+knowledge
lorenzejay Nov 18, 2024
cbfcde7
consolodation and improvements
lorenzejay Nov 18, 2024
4831dcb
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 18, 2024
d579c5a
linted
lorenzejay Nov 18, 2024
b104404
cleanup rm unused embedder
lorenzejay Nov 19, 2024
70910dd
fix test
lorenzejay Nov 19, 2024
c8bf242
fix duplicate
lorenzejay Nov 19, 2024
cbfdbe3
generating cassettes for knowledge test
lorenzejay Nov 19, 2024
e882725
updated default embedder
lorenzejay Nov 19, 2024
efa8a37
None embedder to use default on pipeline cloning
lorenzejay Nov 19, 2024
de742c8
improvements
lorenzejay Nov 19, 2024
914067d
fixed text_file_knowledge
lorenzejay Nov 19, 2024
0c5b6f2
mypysrc fixes
lorenzejay Nov 19, 2024
705ee16
type check fixes
lorenzejay Nov 19, 2024
58bf2d5
added extra cassette
lorenzejay Nov 19, 2024
ec2fe6f
just mocks
lorenzejay Nov 19, 2024
8373c9b
linted
lorenzejay Nov 19, 2024
e7d816f
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 19, 2024
787f2ea
mock knowledge query to not spin up db
lorenzejay Nov 20, 2024
b185b9e
linted
lorenzejay Nov 20, 2024
4663997
verbose run
lorenzejay Nov 20, 2024
76da972
put a flag
lorenzejay Nov 20, 2024
fe18da5
fix
lorenzejay Nov 20, 2024
23276cb
adding docs
lorenzejay Nov 20, 2024
3c4504b
better docs
lorenzejay Nov 20, 2024
44ab749
improvements from review
lorenzejay Nov 20, 2024
52189a4
more docs
lorenzejay Nov 20, 2024
8a54042
linted
lorenzejay Nov 20, 2024
8564f55
rm print
lorenzejay Nov 20, 2024
38c0d61
more fixes
lorenzejay Nov 20, 2024
9329119
clearer docs
lorenzejay Nov 20, 2024
6359b64
added docstrings and type hints for cli
lorenzejay Nov 20, 2024
c0ad457
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
run: uv python install 3.11.9

- name: Install the project
run: uv sync --dev
run: uv sync --dev --all-extras
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do you think we need the --all-extra option in this case? It seems like we'll have to install all the optional dependencies to be able to run our tests. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a bunch of optional dep that were brought up like the pdfplumber for our PdfKnowledgeSource.


- name: Run tests
run: uv run pytest tests
32 changes: 32 additions & 0 deletions path/to/src/crewai/knowledge/source/base_knowledge_source.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from abc import ABC, abstractmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I imagine that the path of this file is not correct.
path/to/src/crewai/knowledge/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks right ? Abstract class could be inside the source dir

from typing import List

from crewai.knowledge.embedder.base_embedder import BaseEmbedder


class BaseKnowledgeSource(ABC):
"""Abstract base class for different types of knowledge sources."""

def __init__(
self,
chunk_size: int = 1000,
chunk_overlap: int = 200,
):
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.chunks: List[str] = []

@abstractmethod
def load_content(self):
lorenzejay marked this conversation as resolved.
Show resolved Hide resolved
"""Load and preprocess content from the source."""
pass

@abstractmethod
def add(self, embedder: BaseEmbedder) -> None:
"""Add content to the knowledge base, chunk it, and compute embeddings."""
pass

@abstractmethod
def query(self, embedder: BaseEmbedder, query: str, top_k: int = 3) -> str:
"""Query the knowledge base using semantic search."""
pass
10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,16 @@ Repository = "https://github.com/crewAIInc/crewAI"
[project.optional-dependencies]
tools = ["crewai-tools>=0.14.0"]
agentops = ["agentops>=0.3.0"]
fastembed = ["fastembed>=0.4.1"]
pdfplumber = [
"pdfplumber>=0.11.4",
]
pandas = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:I'm wondering if we need to keep "pandas" as an optional dependency. I took a look at the code, and it seems we're only using it to read Excel files and save them as CSVs. Maybe we could find some lighter libraries to handle that? Just a thought!

If the lib is still required maybe we should go with "polars"

Polars: ~8.5MB
Pandas: ~12MB

Polars: ~70ms
NumPy: ~104ms
Pandas: ~520ms

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are optional deps, maybe this can be a fast follow ?

"pandas>=2.2.3",
]
openpyxl = [
"openpyxl>=3.1.5",
]
mem0 = ["mem0ai>=0.1.29"]

[tool.uv]
Expand Down
14 changes: 13 additions & 1 deletion src/crewai/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import warnings

from crewai.agent import Agent
from crewai.crew import Crew
from crewai.flow.flow import Flow
from crewai.knowledge.knowledge import Knowledge
from crewai.llm import LLM
from crewai.pipeline import Pipeline
from crewai.process import Process
Expand All @@ -15,4 +17,14 @@
module="pydantic.main",
)
__version__ = "0.80.0"
__all__ = ["Agent", "Crew", "Process", "Task", "Pipeline", "Router", "LLM", "Flow"]
__all__ = [
"Agent",
"Crew",
"Process",
"Task",
"Pipeline",
"Router",
"LLM",
"Flow",
"Knowledge",
]
28 changes: 26 additions & 2 deletions src/crewai/agent.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
import os
import shutil
import subprocess
from typing import Any, List, Literal, Optional, Union
from typing import Any, List, Literal, Optional, Union, Dict, Any

from pydantic import Field, InstanceOf, PrivateAttr, model_validator

from crewai.agents import CacheHandler
from crewai.agents.agent_builder.base_agent import BaseAgent
from crewai.agents.crew_agent_executor import CrewAgentExecutor
from crewai.cli.constants import ENV_VARS
from crewai.knowledge.knowledge import Knowledge
from crewai.llm import LLM
from crewai.memory.contextual.contextual_memory import ContextualMemory
from crewai.tools.agent_tools.agent_tools import AgentTools
from crewai.tools import BaseTool
from crewai.tools.agent_tools.agent_tools import AgentTools
from crewai.utilities import Converter, Prompts
from crewai.utilities.constants import TRAINED_AGENTS_DATA_FILE, TRAINING_DATA_FILE
from crewai.utilities.token_counter_callback import TokenCalcHandler
Expand Down Expand Up @@ -52,6 +53,7 @@ class Agent(BaseAgent):
role: The role of the agent.
goal: The objective of the agent.
backstory: The backstory of the agent.
knowledge: The knowledge base of the agent.
config: Dict representation of agent configuration.
llm: The language model that will run the agent.
function_calling_llm: The language model that will handle the tool calling for this agent, it overrides the crew function_calling_llm.
Expand Down Expand Up @@ -119,6 +121,7 @@ class Agent(BaseAgent):
default="safe",
description="Mode for code execution: 'safe' (using Docker) or 'unsafe' (direct execution).",
)
_knowledge: Optional[Knowledge] = PrivateAttr(default=None)

@model_validator(mode="after")
def post_init_setup(self):
Expand Down Expand Up @@ -227,6 +230,12 @@ def post_init_setup(self):
if self.allow_code_execution:
self._validate_docker_installation()

# Initialize the Knowledge object if knowledge_sources are provided
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: But in here you can do

self._knowledge = None
if self.crew and self.crew.knowledge_store:
     self._knowledge = self.crew.knowledge_store

Or even remove the = None
Since the default form the Model is None

if self.crew and self.crew.knowledge:
self._knowledge = self.crew.knowledge
else:
self._knowledge = None

return self

def _setup_agent_executor(self):
Expand Down Expand Up @@ -272,6 +281,21 @@ def execute_task(
if memory.strip() != "":
task_prompt += self.i18n.slice("memory").format(memory=memory)

# Integrate the knowledge base
if self.crew and self.crew.knowledge:
knowledge_snippets: List[Dict[str, Any]] = self.crew.knowledge.query(
[task.prompt()]
)
if knowledge_snippets:
valid_snippets = [
result["context"]
for result in knowledge_snippets
if result and result.get("context")
]
if valid_snippets:
formatted_knowledge = "\n".join(valid_snippets)
task_prompt += f"\n\nAdditional Information:\n{formatted_knowledge}"

tools = tools or self.tools or []
self.create_agent_executor(tools=tools, task=task)

Expand Down
7 changes: 4 additions & 3 deletions src/crewai/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,24 +136,25 @@ def log_tasks_outputs() -> None:
@click.option("-l", "--long", is_flag=True, help="Reset LONG TERM memory")
@click.option("-s", "--short", is_flag=True, help="Reset SHORT TERM memory")
@click.option("-e", "--entities", is_flag=True, help="Reset ENTITIES memory")
@click.option("-kn", "--knowledge", is_flag=True, help="Reset KNOWLEDGE storage")
@click.option(
"-k",
"--kickoff-outputs",
is_flag=True,
help="Reset LATEST KICKOFF TASK OUTPUTS",
)
@click.option("-a", "--all", is_flag=True, help="Reset ALL memories")
def reset_memories(long, short, entities, kickoff_outputs, all):
def reset_memories(long, short, entities, knowledge, kickoff_outputs, all):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Maybe add type-hints 😅

"""
Reset the crew memories (long, short, entity, latest_crew_kickoff_ouputs). This will delete all the data saved.
"""
try:
if not all and not (long or short or entities or kickoff_outputs):
if not all and not (long or short or entities or knowledge or kickoff_outputs):
click.echo(
"Please specify at least one memory type to reset using the appropriate flags."
)
return
reset_memories_command(long, short, entities, kickoff_outputs, all)
reset_memories_command(long, short, entities, knowledge, kickoff_outputs, all)
except Exception as e:
click.echo(f"An error occurred while resetting memories: {e}", err=True)

Expand Down
15 changes: 14 additions & 1 deletion src/crewai/cli/reset_memories_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,17 @@
from crewai.memory.long_term.long_term_memory import LongTermMemory
from crewai.memory.short_term.short_term_memory import ShortTermMemory
from crewai.utilities.task_output_storage_handler import TaskOutputStorageHandler
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage


def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
def reset_memories_command(
long,
short,
entity,
knowledge,
kickoff_outputs,
all,
) -> None:
"""
Reset the crew memories.
Expand All @@ -17,6 +25,7 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
entity (bool): Whether to reset the entity memory.
kickoff_outputs (bool): Whether to reset the latest kickoff task outputs.
all (bool): Whether to reset all memories.
knowledge (bool): Whether to reset the knowledge.
"""

try:
Expand All @@ -25,6 +34,7 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
EntityMemory().reset()
LongTermMemory().reset()
TaskOutputStorageHandler().reset()
KnowledgeStorage().reset()
click.echo("All memories have been reset.")
else:
if long:
Expand All @@ -40,6 +50,9 @@ def reset_memories_command(long, short, entity, kickoff_outputs, all) -> None:
if kickoff_outputs:
TaskOutputStorageHandler().reset()
click.echo("Latest Kickoff outputs stored has been reset.")
if knowledge:
KnowledgeStorage().reset()
click.echo("Knowledge has been reset.")

except subprocess.CalledProcessError as e:
click.echo(f"An error occurred while resetting the memories: {e}", err=True)
Expand Down
16 changes: 16 additions & 0 deletions src/crewai/crew.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
from crewai.memory.entity.entity_memory import EntityMemory
from crewai.memory.long_term.long_term_memory import LongTermMemory
from crewai.memory.short_term.short_term_memory import ShortTermMemory
from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from crewai.memory.user.user_memory import UserMemory
from crewai.process import Process
from crewai.task import Task
Expand Down Expand Up @@ -193,6 +195,13 @@ class Crew(BaseModel):
default=[],
description="List of execution logs for tasks",
)
knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field(
default=None,
description="Knowledge sources for the agent.",
)
knowledge: Optional[Knowledge] = Field(
default=None, description="Knowledge Source for the crew."
)

@field_validator("id", mode="before")
@classmethod
Expand Down Expand Up @@ -267,6 +276,13 @@ def create_crew_memory(self) -> "Crew":
self._user_memory = None
return self

@model_validator(mode="after")
def create_crew_knowledge(self) -> "Crew":
self.knowledge = Knowledge(
sources=self.knowledge_sources or [], embedder_config=self.embedder
)
return self

@model_validator(mode="after")
def check_manager_llm(self):
"""Validates that the language model is set when using hierarchical process."""
Expand Down
Empty file.
Empty file.
55 changes: 55 additions & 0 deletions src/crewai/knowledge/embedder/base_embedder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from abc import ABC, abstractmethod
from typing import List

import numpy as np


class BaseEmbedder(ABC):
"""
Abstract base class for text embedding models
"""

@abstractmethod
def embed_chunks(self, chunks: List[str]) -> np.ndarray:
"""
Generate embeddings for a list of text chunks
Args:
chunks: List of text chunks to embed
Returns:
Array of embeddings
"""
pass

@abstractmethod
def embed_texts(self, texts: List[str]) -> np.ndarray:
"""
Generate embeddings for a list of texts
Args:
texts: List of texts to embed
Returns:
Array of embeddings
"""
pass

@abstractmethod
def embed_text(self, text: str) -> np.ndarray:
"""
Generate embedding for a single text
Args:
text: Text to embed
Returns:
Embedding array
"""
pass

@property
@abstractmethod
def dimension(self) -> int:
"""Get the dimension of the embeddings"""
pass
Loading
Loading