Merge branch 'main' into Zochory-patch-1

Qredence · Sep 2, 2024 · 6202de2 · 6202de2
2 parents 9fa158f + ca8a0a2
commit 6202de2
Show file tree

Hide file tree

Showing 17 changed files with 349 additions and 286 deletions.
diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml
@@ -4,5 +4,5 @@ github: qredence # Replace with up to 4 GitHub Sponsors-enabled usernames e.g.,
 patreon: # Replace with a single Patreon username
 open_collective: # Replace with a single Open Collective username
 
-buy_me_a_coffee: zacharyq # Replace with a single Buy Me a Coffee username
+buy_me_a_coffee:  # Replace with a single Buy Me a Coffee username
 custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
diff --git a/.github/workflows/graphfleet-openapi.json b/.github/workflows/graphfleet-openapi.json
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -16,7 +16,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v5
       with:
-        python-version: '3.11.5'
+        python-version: '3.11.9'
 
     - name: Install dependencies
       run: |

diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -16,7 +16,7 @@ on:
 
 env:
   POETRY_VERSION: "1.8.3"
-  PYTHON_VERSION: "3.11.5"
+  PYTHON_VERSION: "3.11.9"
 
 jobs:
   publish:

diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -25,5 +25,11 @@
         ".cache": true,
         "retool.config.json": true
     },
-    "python.analysis.autoImportCompletions": true
+    "python.analysis.autoImportCompletions": true,
+    "docwriter.progress.trackFunctions": true,
+    "docwriter.progress.trackMethods": true,
+    "docwriter.style": "NumPy",
+    "docwriter.hotkey.mac": "⌘ + .",
+    "docwriter.progress.trackClasses": true,
+    "docwriter.progress.trackTypes": true
 }
diff --git a/app/main.py b/app/main.py
@@ -12,6 +12,8 @@
 from app.api import _reformat_context_data
 import sentry_sdk
 
+app = FastAPI()
+
 
 sentry_sdk.init(
     dsn="https://741dc950f3465d2db0b8f869832dabc0@o4507875835314176.ingest.de.sentry.io/4507875863429200",

diff --git a/app/services/search_engine.py b/app/services/search_engine.py
@@ -1,4 +1,6 @@
 import os
+import asyncio
+import logging
 import pandas as pd
 import tiktoken
 from fastapi import HTTPException
@@ -22,8 +24,7 @@
     read_indexer_entities,
     read_indexer_reports
 )
-import asyncio
-import logging
+
 
 logger = logging.getLogger(__name__)
 

diff --git a/docs/CODEBASE.md b/docs/CODEBASE.md
@@ -0,0 +1 @@
+
diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
@@ -0,0 +1,49 @@
+# Configuration
+
+This document explains the configuration mechanisms within the GraphFleet project, encompassing both application-level and GraphRAG-specific settings. Think of configuration as setting up the rules and parameters for how GraphFleet operates.
+
+## Configuration Files
+
+- **app/config.py:** Defines application-level settings using Pydantic's `BaseSettings`. This file is like the general settings panel for the entire application.
+- **graphfleet/settings.yaml:** Configures GraphRAG parameters, including LLM settings, embedding models, and data paths. This file is like the specialized settings panel for the GraphRAG engine itself.
+
+## Application-Level Settings (app/config.py)
+
+The `app/config.py` file manages settings related to the FastAPI application, such as:
+
+- **API_KEY:** API key for accessing external services like OpenAI. This is like your login credential to use OpenAI's powerful language models.
+- **LLM_MODEL:** Identifier for the LLM used for response generation. This is like choosing which "brain" you want GraphFleet to use - a more powerful one for complex tasks, or a smaller one for quicker responses.
+- **EMBEDDING_MODEL:** Name of the embedding model used for text representation. This is like choosing the language that GraphFleet uses to understand and process text.
+- **API_BASE:** Base URL for external API endpoints. This is like the address of the OpenAI service that GraphFleet communicates with.
+- **API_VERSION:** Version of the external API being used. This ensures that GraphFleet is speaking the same language as the OpenAI service.
+- **INPUT_DIR:** Directory containing input data for GraphRAG. This is like the library where GraphFleet gets all its information from.
+- **LANCEDB_URI:** URI for connecting to the LanceDB instance. This is like the address of the database where GraphFleet stores its knowledge graph.
+- **COMMUNITY_LEVEL:** Level of community detection in the knowledge graph. This is like setting the granularity for how GraphFleet groups related information together.
+- **MAX_TOKENS:** Maximum number of tokens allowed for LLM responses. This is like setting a limit on how long GraphFleet's answers can be.
+
+## GraphRAG Settings (graphfleet/settings.yaml)
+
+The `graphfleet/settings.yaml` file configures the behavior of the GraphRAG implementation, including:
+
+- **LLM Settings:** API keys, model names, temperature, top_p, and other parameters for LLM interaction. These settings fine-tune how GraphRAG interacts with the chosen language model.
+- **Embedding Settings:** API keys, model names, and other parameters for embedding generation. These settings control how GraphFleet translates text into a format it can understand.
+- **Data Input and Output:** Paths for input data sources, cache directories, and output artifacts. These settings tell GraphFleet where to find data, where to store temporary files, and where to save results.
+- **Chunking:** Chunk size, overlap, and grouping criteria for dividing text data. These settings control how GraphFleet breaks down large pieces of text into smaller, more manageable chunks.
+- **Entity and Relationship Extraction:** Prompt templates, entity types, and maximum gleanings for entity and relationship extraction. These settings fine-tune how GraphFleet identifies and extracts key information from text.
+- **Knowledge Graph Storage:** Storage type (file or blob) and related parameters. These settings determine how and where GraphFleet stores its knowledge graph.
+- **Local and Global Search:** Parameters controlling the behavior of local and global search operations. These settings fine-tune how GraphFleet searches for information within its knowledge graph.
+
+## Environment Variables
+
+Configuration values can be overridden using environment variables. The `app/config.py` file uses `python-dotenv` to load environment variables from a `.env` file. This allows you to easily switch between different configurations without modifying the code directly.
+
+## Configuration Hierarchy
+
+The configuration follows a hierarchical structure, with environment variables overriding values specified in `app/config.py`, which in turn override defaults defined in `graphfleet/settings.yaml`. This ensures that the most specific settings take precedence.
+
+## Best Practices
+
+- Store sensitive information, such as API keys, in environment variables. This keeps your sensitive data separate from your codebase.
+- Use descriptive names for configuration parameters. This makes your configuration files easier to understand and maintain.
+- Document configuration options clearly in the respective files. This helps you and others understand the purpose of each setting.
+- Test configuration changes thoroughly to ensure desired behavior. Don't assume your changes will work as expected - always test them!