Skip to content

Commit

Permalink
Merge branch 'main' into Zochory-patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
Zochory authored Sep 2, 2024
2 parents 9fa158f + ca8a0a2 commit 6202de2
Show file tree
Hide file tree
Showing 17 changed files with 349 additions and 286 deletions.
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ github: qredence # Replace with up to 4 GitHub Sponsors-enabled usernames e.g.,
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username

buy_me_a_coffee: zacharyq # Replace with a single Buy Me a Coffee username
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
266 changes: 0 additions & 266 deletions .github/workflows/graphfleet-openapi.json

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11.5'
python-version: '3.11.9'

- name: Install dependencies
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ on:

env:
POETRY_VERSION: "1.8.3"
PYTHON_VERSION: "3.11.5"
PYTHON_VERSION: "3.11.9"

jobs:
publish:
Expand Down
8 changes: 7 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,11 @@
".cache": true,
"retool.config.json": true
},
"python.analysis.autoImportCompletions": true
"python.analysis.autoImportCompletions": true,
"docwriter.progress.trackFunctions": true,
"docwriter.progress.trackMethods": true,
"docwriter.style": "NumPy",
"docwriter.hotkey.mac": "⌘ + .",
"docwriter.progress.trackClasses": true,
"docwriter.progress.trackTypes": true
}
2 changes: 2 additions & 0 deletions app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
from app.api import _reformat_context_data
import sentry_sdk

app = FastAPI()


sentry_sdk.init(
dsn="https://741dc950f3465d2db0b8f869832dabc0@o4507875835314176.ingest.de.sentry.io/4507875863429200",
Expand Down
5 changes: 3 additions & 2 deletions app/services/search_engine.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import os
import asyncio
import logging
import pandas as pd
import tiktoken
from fastapi import HTTPException
Expand All @@ -22,8 +24,7 @@
read_indexer_entities,
read_indexer_reports
)
import asyncio
import logging


logger = logging.getLogger(__name__)

Expand Down
1 change: 1 addition & 0 deletions docs/CODEBASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

49 changes: 49 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Configuration

This document explains the configuration mechanisms within the GraphFleet project, encompassing both application-level and GraphRAG-specific settings. Think of configuration as setting up the rules and parameters for how GraphFleet operates.

## Configuration Files

- **app/config.py:** Defines application-level settings using Pydantic's `BaseSettings`. This file is like the general settings panel for the entire application.
- **graphfleet/settings.yaml:** Configures GraphRAG parameters, including LLM settings, embedding models, and data paths. This file is like the specialized settings panel for the GraphRAG engine itself.

## Application-Level Settings (app/config.py)

The `app/config.py` file manages settings related to the FastAPI application, such as:

- **API_KEY:** API key for accessing external services like OpenAI. This is like your login credential to use OpenAI's powerful language models.
- **LLM_MODEL:** Identifier for the LLM used for response generation. This is like choosing which "brain" you want GraphFleet to use - a more powerful one for complex tasks, or a smaller one for quicker responses.
- **EMBEDDING_MODEL:** Name of the embedding model used for text representation. This is like choosing the language that GraphFleet uses to understand and process text.
- **API_BASE:** Base URL for external API endpoints. This is like the address of the OpenAI service that GraphFleet communicates with.
- **API_VERSION:** Version of the external API being used. This ensures that GraphFleet is speaking the same language as the OpenAI service.
- **INPUT_DIR:** Directory containing input data for GraphRAG. This is like the library where GraphFleet gets all its information from.
- **LANCEDB_URI:** URI for connecting to the LanceDB instance. This is like the address of the database where GraphFleet stores its knowledge graph.
- **COMMUNITY_LEVEL:** Level of community detection in the knowledge graph. This is like setting the granularity for how GraphFleet groups related information together.
- **MAX_TOKENS:** Maximum number of tokens allowed for LLM responses. This is like setting a limit on how long GraphFleet's answers can be.

## GraphRAG Settings (graphfleet/settings.yaml)

The `graphfleet/settings.yaml` file configures the behavior of the GraphRAG implementation, including:

- **LLM Settings:** API keys, model names, temperature, top_p, and other parameters for LLM interaction. These settings fine-tune how GraphRAG interacts with the chosen language model.
- **Embedding Settings:** API keys, model names, and other parameters for embedding generation. These settings control how GraphFleet translates text into a format it can understand.
- **Data Input and Output:** Paths for input data sources, cache directories, and output artifacts. These settings tell GraphFleet where to find data, where to store temporary files, and where to save results.
- **Chunking:** Chunk size, overlap, and grouping criteria for dividing text data. These settings control how GraphFleet breaks down large pieces of text into smaller, more manageable chunks.
- **Entity and Relationship Extraction:** Prompt templates, entity types, and maximum gleanings for entity and relationship extraction. These settings fine-tune how GraphFleet identifies and extracts key information from text.
- **Knowledge Graph Storage:** Storage type (file or blob) and related parameters. These settings determine how and where GraphFleet stores its knowledge graph.
- **Local and Global Search:** Parameters controlling the behavior of local and global search operations. These settings fine-tune how GraphFleet searches for information within its knowledge graph.

## Environment Variables

Configuration values can be overridden using environment variables. The `app/config.py` file uses `python-dotenv` to load environment variables from a `.env` file. This allows you to easily switch between different configurations without modifying the code directly.

## Configuration Hierarchy

The configuration follows a hierarchical structure, with environment variables overriding values specified in `app/config.py`, which in turn override defaults defined in `graphfleet/settings.yaml`. This ensures that the most specific settings take precedence.

## Best Practices

- Store sensitive information, such as API keys, in environment variables. This keeps your sensitive data separate from your codebase.
- Use descriptive names for configuration parameters. This makes your configuration files easier to understand and maintain.
- Document configuration options clearly in the respective files. This helps you and others understand the purpose of each setting.
- Test configuration changes thoroughly to ensure desired behavior. Don't assume your changes will work as expected - always test them!
Loading

0 comments on commit 6202de2

Please sign in to comment.