Knowledge #1567

bhancockio · 2024-11-07T16:31:59Z

No description provided.

lorenzejay · 2024-11-14T22:47:59Z

src/crewai/knowledge/knowledge.py

+from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
+
+
+class Knowledge(BaseModel):


Lets write some docs about:

how to use this

setting your own custom embedder for this

lorenzejay · 2024-11-14T22:49:20Z

src/crewai/knowledge/embedder/ollama.py

@@ -0,0 +1,82 @@
+import os


i'd drop the ollama version. support openai, then let anyone bring their own embedder function (super easy) then have the knowledge_config setup like embedder_config setup for our rag storage

lorenzejay · 2024-11-14T23:01:44Z

src/crewai/knowledge/source/base_knowledge_source.py

+    @abstractmethod
+    def add(self, embedder: BaseEmbedder) -> None:
+        """Process content, chunk it, compute embeddings, and save them."""
+        pass
+
+    def get_embeddings(self) -> List[np.ndarray]:
+        """Return the list of embeddings for the chunks."""


lets make this save to the project directory instead of the root.

lorenzejay · 2024-11-14T23:09:23Z

src/crewai/knowledge/source/csv_knowledge_source.py

+        # Compute embeddings for the new chunks
+        new_embeddings = embedder.embed_chunks(new_chunks)
+        # Save the embeddings
+        self.chunk_embeddings.extend(new_embeddings)


We should also be saving this to a db and persist it. like our ragstorage

We should do this as we can generate the embeddings for files once, then just query if they already exist. otherwise, users will spend tokens generating embeddings that already exist.

I'll help with this.

lorenzejay · 2024-11-15T00:39:20Z

src/crewai/knowledge/source/base_knowledge_source.py

+from pydantic import BaseModel, ConfigDict, Field
+
+from crewai.knowledge.embedder.base_embedder import BaseEmbedder
+


extending this to save embeddings to db, then using knowledge class to query from there

lorenzejay · 2024-11-15T19:21:58Z

src/crewai/agent.py

@@ -85,6 +88,10 @@ class Agent(BaseAgent):
    llm: Union[str, InstanceOf[LLM], Any] = Field(
        description="Language model that will run the agent.", default=None
    )
+    knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field(
+        default=None,
+        description="Knowledge sources for the agent.",


Would be better to declare this on the crew class. the task prompt will query from the relevant trickling down to the agent level, then defining here on the agent level

lorenzejay · 2024-11-19T23:00:13Z

tests passing locally:

pythonbyte

Hey, @lorenzejay left some comments, let me know if you have any questions on those 😄

pythonbyte · 2024-11-20T18:09:48Z

docs/concepts/knowledge.mdx

+)
+
+# Create a knowledge store with a list of sources and metadata
+knowledge = Knowledge(sources=[string_source], metadata={"preference": "personal"})


Question: Should knowledge be used in some of the code or works behind the curtains?

Crew(knowledge_sources = knowledge, ...)?

pythonbyte · 2024-11-20T18:11:02Z

docs/concepts/knowledge.mdx

+
+```python
+from crewai import Agent, Task, Crew, Process, LLM
+from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource


Question: I saw that StringKnowledgeSource was imported but not used here. Is that the intention?

Youre right. that was not intended

pythonbyte · 2024-11-20T18:12:42Z

.github/workflows/tests.yml

@@ -26,7 +26,7 @@ jobs:
        run: uv python install 3.11.9

      - name: Install the project
-        run: uv sync --dev
+        run: uv sync --dev --all-extras


Question: Do you think we need the --all-extra option in this case? It seems like we'll have to install all the optional dependencies to be able to run our tests. What do you think?

Yes, there are a bunch of optional dep that were brought up like the pdfplumber for our PdfKnowledgeSource.

pythonbyte · 2024-11-20T18:15:39Z

path/to/src/crewai/knowledge/source/base_knowledge_source.py

@@ -0,0 +1,30 @@
+from abc import ABC, abstractmethod


Question: I imagine that the path of this file is not correct.
path/to/src/crewai/knowledge/

looks right ? Abstract class could be inside the source dir

path/to/src/crewai/knowledge/source/base_knowledge_source.py

src/crewai/knowledge/knowledge.py

pythonbyte · 2024-11-20T18:50:58Z

tests/knowledge/crewai_quickstart.pdf

Question: Are you planning to upload this .pdf to the repository?

Yes as we are using it for the PDFKnowledgeSource

pythonbyte · 2024-11-20T18:52:34Z

src/crewai/knowledge/source/base_file_knowledge_source.py

+    storage: KnowledgeStorage = Field(default_factory=KnowledgeStorage)
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+
+    def model_post_init(self, context):


Question: Noticed that context is being used as a parameter but not used on the method, is that expected?

resolved this was a pydantic specific feature: https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_post_init

so did _ instead

pythonbyte · 2024-11-20T18:56:45Z

src/crewai/knowledge/source/base_file_knowledge_source.py

+
+    def save_documents(self, metadata: Dict[str, Any]):
+        """Save the documents to the storage."""
+        chunk_metadatas = [metadata.copy() for _ in self.chunks]


Question: Could you help me understand what this line of code does? Thanks!

Before adding documents to the vector db, it needs metadata object. The metadata can be the same as its a representation of itself as a chunk.

When getting chunked (lets say a big pdf), it will x amount of content. so the defined metadata, should be cloned to be passed as well

overall we need per chunk a metadata dict to go with it.

src/crewai/knowledge/source/base_knowledge_source.py

pythonbyte

LGTM ✅
Amazing Work

joaomdmoura and others added 11 commits November 4, 2024 15:53

initial knowledge

75322b2

Merge branch 'main' into knowledge

dc314c1

WIP

a8a2f80

Adding core knowledge sources

1a35114

Improve types and better support for file paths

6131dba

added additional sources

617ee98

Merge branch 'main' into knowledge

4af263c

fix linting

59165cb

update yaml to include optional deps

86ede83

adding in lorenze feedback

7b59c5b

Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge

98a708c

lorenzejay self-requested a review November 14, 2024 20:22

lorenzejay reviewed Nov 14, 2024

View reviewed changes

lorenzejay requested changes Nov 15, 2024

View reviewed changes

ensure embeddings are persisted

10f445e

lorenzejay reviewed Nov 15, 2024

View reviewed changes

lorenzejay added 7 commits November 15, 2024 15:28

improvements all around Knowledge class

cb03ee6

Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge

cdf5233

return this

b907938

properly reset memory

352d053

properly reset memory+knowledge

b2c06d5

consolodation and improvements

cbfcde7

Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge

4831dcb

lorenzejay approved these changes Nov 18, 2024

View reviewed changes

lorenzejay added 4 commits November 18, 2024 13:58

linted

d579c5a

cleanup rm unused embedder

b104404

fix test

70910dd

fix duplicate

c8bf242

lorenzejay added 9 commits November 19, 2024 10:43

updated default embedder

e882725

None embedder to use default on pipeline cloning

efa8a37

improvements

de742c8

fixed text_file_knowledge

914067d

mypysrc fixes

0c5b6f2

type check fixes

705ee16

added extra cassette

58bf2d5

just mocks

ec2fe6f

linted

8373c9b

lorenzejay added 7 commits November 19, 2024 15:09

Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge

e7d816f

mock knowledge query to not spin up db

787f2ea

linted

b185b9e

verbose run

4663997

put a flag

76da972

fix

fe18da5

adding docs

23276cb

pythonbyte reviewed Nov 20, 2024

View reviewed changes

lorenzejay added 7 commits November 20, 2024 13:31

better docs

3c4504b

improvements from review

44ab749

more docs

52189a4

linted

8a54042

rm print

8564f55

more fixes

38c0d61

clearer docs

9329119

pythonbyte approved these changes Nov 20, 2024

View reviewed changes

lorenzejay added 2 commits November 20, 2024 15:36

added docstrings and type hints for cli

6359b64

Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge

c0ad457

lorenzejay merged commit 14a36d3 into main Nov 20, 2024
4 checks passed

joaomdmoura mentioned this pull request Nov 21, 2024

fix: missing key name when running with ollama provider #1638

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge #1567

Knowledge #1567

bhancockio commented Nov 7, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 14, 2024

lorenzejay Nov 15, 2024

lorenzejay Nov 15, 2024

lorenzejay commented Nov 19, 2024

pythonbyte left a comment

pythonbyte Nov 20, 2024

pythonbyte Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte Nov 20, 2024

lorenzejay Nov 20, 2024

pythonbyte left a comment

		from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource


		class Knowledge(BaseModel):

		from pydantic import BaseModel, ConfigDict, Field

		from crewai.knowledge.embedder.base_embedder import BaseEmbedder

Knowledge #1567

Knowledge #1567

Conversation

bhancockio commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzejay commented Nov 19, 2024

pythonbyte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pythonbyte left a comment

Choose a reason for hiding this comment