Allows easy use of basic KoboldCpp API endpoints, including streaming generations, images, samplers.
Finds the appropriate instruct template for the running model and wraps it around content to create a prompt.
Will read most types of document and chunk them any size up to max context. Stops at natural break points. Returns the chunks as a list.
KoboldCPP is a powerful and portable solution for running Large Language Models (LLMs). Its standout features include:
- Zero-installation deployment with single executable
- Support for any GGUF model compatible with LlamaCPP
- Cross-platform support (Linux, Windows, macOS)
- Hardware acceleration via CUDA and Vulkan
- Built-in GUI with extensive features
- Multimodal capabilities (image generation, speech, etc.)
- API compatibility with OpenAI and Ollama
- Download the KoboldCPP executable for your platform
- Place your GGUF model file in the same directory
- Install the Python client:
git clone https://github.com/jabberjabberjabber/koboldapi-python
cd koboldapi-python
pip install git+https://github.com/jabberjabberjabber/koboldapi-python.git
Here's a minimal example to get started:
from koboldapi import KoboldAPI
# Initialize the client
api = KoboldAPI("http://localhost:5001")
# Basic text generation
response = api.generate(
prompt="Write a haiku about programming:",
max_length=50,
temperature=0.7
)
print(response)
The KoboldAPIConfig
class manages configuration settings for the API client. You can either create a config programmatically or load it from a JSON file:
from koboldapi import KoboldAPIConfig
# Create config programmatically
config = KoboldAPIConfig(
api_url="http://localhost:5001",
api_password="",
templates_directory="./templates",
translation_language="English",
temp=0.7,
top_k=40,
top_p=0.9,
rep_pen=1.1
)
# Or load from JSON file
config = KoboldAPIConfig.from_json("config.json")
# Save config to file
config.to_json("new_config.json")
Example config.json:
{
"api_url": "http://localhost:5001",
"api_password": "",
"templates_directory": "./templates",
"translation_language": "English",
"temp": 0.7,
"top_k": 40,
"top_p": 0.9,
"rep_pen": 1.1
}
KoboldAPI supports various instruction formats through templates. The InstructTemplate
class handles this automatically:
from koboldapi.templates import InstructTemplate
template = InstructTemplate("./templates", "http://localhost:5001")
# Wrap a prompt with the appropriate template
wrapped_prompt = template.wrap_prompt(
instruction="Explain quantum computing",
content="Focus on qubits and superposition",
system_instruction="You are a quantum physics expert"
)
The library includes example scripts for various text processing tasks:
from koboldapi import KoboldAPICore
from koboldapi.chunking.processor import ChunkingProcessor
# Initialize core with config
config = {
"api_url": "http://localhost:5001",
"templates_directory": "./templates"
}
core = KoboldAPICore(config)
# Process a text file
processor = ChunkingProcessor(core.api_client, max_chunk_length=2048)
chunks, metadata = processor.chunk_file("document.txt")
# Generate summary for each chunk
for chunk, _ in chunks:
summary = core.api_client.generate(
prompt=core.template_wrapper.wrap_prompt(
instruction="Summarize this text",
content=chunk
)[0],
max_length=200
)
print(summary)
Process images:
from koboldapi import KoboldAPICore
from pathlib import Path
# Initialize core
config = {
"api_url": "http://localhost:5001",
"templates_directory": "./templates"
}
core = KoboldAPICore(config)
# Process image
image_path = Path("image.png")
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
result = core.api_client.generate(
prompt=core.template_wrapper.wrap_prompt(
instruction="Extract text from this image",
system_instruction="You are an OCR system"
)[0],
images=[image_data],
temperature=0.1
)
print(result)
Create custom instruction templates for different models:
{
"name": ["vicuna-7b", "vicuna-13b"],
"system_start": "### System:\n",
"system_end": "\n\n",
"user_start": "### Human: ",
"user_end": "\n\n",
"assistant_start": "### Assistant: ",
"assistant_end": "\n\n"
}
Fine-tune generation settings:
response = api.generate(
prompt="Write a story:",
max_length=500,
temperature=0.8, # Higher = more creative
top_p=0.9, # Nucleus sampling threshold
top_k=40, # Top-k sampling threshold
rep_pen=1.1, # Repetition penalty
rep_pen_range=256, # How far back to apply rep penalty
min_p=0.05 # Minimum probability threshold
)
Implement robust error handling:
from koboldapi import KoboldAPIError
try:
response = api.generate(prompt="Test prompt")
except KoboldAPIError as e:
print(f"API Error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Optimize token usage:
# Get max context length
max_length = api.get_max_context_length()
# Count tokens in prompt
token_count = api.count_tokens(prompt)["count"]
# Ensure we stay within limits
available_tokens = max_length - token_count
response_length = min(desired_length, available_tokens)
Handle multiple inputs efficiently:
async def process_batch(prompts):
results = []
for prompt in prompts:
async for token in api.stream_generate(prompt):
results.append(token)
return results
- Connection Errors
# Test connection
if not api.validate_connection():
print("Cannot connect to API")
- Template Errors
# Check if template exists
if not template.get_template():
print("No matching template found for model")
- Generation Errors
# Monitor generation status
status = api.check_generation()
if status is None:
print("Generation failed or was interrupted")
Contributions to improve these tools are welcome. Please submit issues and pull requests on GitHub.
- Clone the repository
- Install development dependencies:
pip install -e ".[dev]"
- Run tests:
pytest tests/
This project is licensed under the GPLv3 license.