RunPod LLM Pipeline

This repository provides a robust pipeline script to manage the lifecycle of a machine learning container on RunPod, perform LLM (Large Language Model) inference, and store the results in an ArangoDB database. It is designed for efficient model deployment, request handling, and integration with caching mechanisms.

Features

Model Lifecycle Management: Automatically start, monitor, and stop RunPod containers for inference tasks.
Database Integration: Connects to an ArangoDB instance for storing LLM requests and responses with support for caching.
Request Deduplication: Generates unique hashes for LLM requests to prevent redundant computations.
Error Handling: Incorporates retry mechanisms, logging, and robust cleanup steps for resource management.
Customizable Models: Easily configurable to use different models, including SGLang-Qwen and other Hugging Face-hosted models.

File Overview

`runpod_llm_pipeline.py`

The main script that orchestrates the pipeline, including:

Model Initialization: Loads configurations for the model and RunPod container.
RunPod Management: Starts or reuses a container, checks readiness, and handles cleanup.
Request Handling: Prepares LLM requests with unique hashes.
LLM Inference: Sends requests to the model container and processes responses.
Database Storage: Upserts results into ArangoDB for persistence and traceability.

`config.py`

Holds configuration details for:

RunPod API and Hugging Face tokens.
Model-specific settings such as GPU preferences, container disk size, and environmental variables.
Pipeline configurations like ArangoDB connection details.

`runpod_ops.py`

Utility functions for managing RunPod containers:

Starting, stopping, and monitoring container status.
Checking API readiness for deployed containers.

`llm_utils.py`

Helper functions for:

Adding hashes to requests to enable deduplication.
Sending requests to the LLM and merging responses with the original requests.

Prerequisites

Python Environment: Ensure Python 3.8+ is installed.
Dependencies:
- Install required packages using pip install -r requirements.txt.
ArangoDB: Set up an instance of ArangoDB and ensure it is accessible.
RunPod Account: Obtain an API key from RunPod to manage containers.
Hugging Face Token: Required for downloading models during container initialization.

Usage

Configure Environment Variables:
- Create a .env file with the following variables:
```
HF_TOKEN=<your_huggingface_token>
RUNPOD_API_KEY=<your_runpod_api_key>
```
Run the Pipeline: Execute the script to start the pipeline:
```
python runpod_llm_pipeline.py
```
Monitor Logs: Logs are generated at each step to ensure transparency and debugging.

Key Steps in the Pipeline

Start RunPod Container:
- Automatically starts a container with the specified model and preferred GPU.
Prepare Requests:
- Each request is hashed for efficient deduplication.
Send Requests to LLM:
- The requests are processed by the deployed model using the specified API parameters.
Store Results in ArangoDB:
- Responses are upserted into a specified collection in ArangoDB for persistence.
Cleanup:
- Ensures that containers are stopped and resources are released, even in case of errors.

Customization

Model Configuration: Modify MODEL_CONFIGS in config.py to add or update model-specific settings.
Database Settings: Update the pipeline_config['arango_config'] to customize ArangoDB settings like host, database name, and collection.
RunPod Parameters: Adjust DEFAULT_POD_SETTINGS to optimize container resources or specify preferred GPUs.

Error Handling

The pipeline includes robust mechanisms to handle errors:

Retry Logic: Reattempts API requests and container operations using exponential backoff.
Graceful Cleanup: Ensures that containers are stopped properly in the event of errors.

Future Enhancements

Support for multiple model types in parallel.
Advanced analytics for token usage and request trends.
Integration with monitoring dashboards for RunPod container status.

This pipeline is a scalable and modular solution for LLM deployment and inference. Contributions and suggestions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.py		config.py
requirements.txt		requirements.txt
runpod_llm_pipeline.py		runpod_llm_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunPod LLM Pipeline

Features

File Overview

`runpod_llm_pipeline.py`

`config.py`

`runpod_ops.py`

`llm_utils.py`

Prerequisites

Usage

Key Steps in the Pipeline

Customization

Error Handling

Future Enhancements

About

Releases

Packages

Languages

grahama1970/runpod_llm_ops

Folders and files

Latest commit

History

Repository files navigation

RunPod LLM Pipeline

Features

File Overview

runpod_llm_pipeline.py

config.py

runpod_ops.py

llm_utils.py

Prerequisites

Usage

Key Steps in the Pipeline

Customization

Error Handling

Future Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`runpod_llm_pipeline.py`

`config.py`

`runpod_ops.py`

`llm_utils.py`

Packages