[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

ZHANGWENTAI · 2024-07-06T13:15:24Z

Description

We are aiming to improve the deployment and management of large models by addressing the following key areas:

Prerequisites

Fix documentation errors.
Adjust configuration files to adapt to the domestic network environment.

Large Model Auto-Configuration Tuning

Create Multiple Inference Framework Images: Develop Docker images that support the deployment of models using frameworks like PyTorch and VLLM, accommodating certain LLMs and quantization types.
Develop New Stress Test Clients: Implement clients that can handle varying input_length to ensure robust testing.
Controller Support: Enhance the controller to manage inference frameworks, input_length, and dtype configurations effectively.

Large Model Service Version Tracking

Define LLM Service Version CRD: Establish a Custom Resource Definition (CRD) for LLM service versions to streamline version management.
Support for LLM Service Versions: Integrate support for LLM service versions in both frontend and backend systems.
Add ArgoCD GitOps Feature: Incorporate GitOps capabilities using ArgoCD for continuous delivery and synchronization of application configurations.

Tasks

Adjust configuration files for domestic network compatibility. Develop Docker images for PyTorch and VLLM model deployment. Implement stress test clients with variable input_length support.
Establish LLM service version CRD. Add support for LLM service versions in application layers.
Integrate ArgoCD for GitOps functionality.

Acceptance Criteria

All documentation should be updated and free of errors.
Configuration files should be tested and confirmed to work with domestic network settings.
Docker images should be built and tested for compatibility with specified frameworks and models.
Stress test clients should be able to simulate different input lengths effectively.
The controller should be able to manage the specified configurations without issues.
LLM service version CRD should be defined and implemented.
Frontend and backend should be updated to recognize and support LLM service versions.
ArgoCD should be set up and tested for continuous deployment and synchronization.

Comments

Please provide any additional information, questions, or concerns related to this issue.

The text was updated successfully, but these errors were encountered:

ZHANGWENTAI mentioned this issue Sep 30, 2024

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #51

Merged

ZHANGWENTAI changed the title ~~[Feature] Add Support for Auto-Configuration of Multiple Inference Frameworks~~ [Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing Oct 9, 2024

ZHANGWENTAI mentioned this issue Oct 11, 2024

add new pypi source, goproxy, npm registry, and update docker image … #53

Merged

This was referenced Oct 29, 2024

new grpc server and client for new knobs. #54

Merged

llm service version crd definition, frontend and backend #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

ZHANGWENTAI commented Jul 6, 2024 •

edited

Loading

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

Comments

ZHANGWENTAI commented Jul 6, 2024 • edited Loading

Description

Prerequisites

Large Model Auto-Configuration Tuning

Large Model Service Version Tracking

Tasks

Acceptance Criteria

Comments

ZHANGWENTAI commented Jul 6, 2024 •

edited

Loading