Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing #52

Open
2 of 3 tasks
ZHANGWENTAI opened this issue Jul 6, 2024 · 0 comments
Open
2 of 3 tasks

Comments

@ZHANGWENTAI
Copy link
Contributor

ZHANGWENTAI commented Jul 6, 2024

Description

We are aiming to improve the deployment and management of large models by addressing the following key areas:

Prerequisites

  • Fix documentation errors.
  • Adjust configuration files to adapt to the domestic network environment.

Large Model Auto-Configuration Tuning

  1. Create Multiple Inference Framework Images: Develop Docker images that support the deployment of models using frameworks like PyTorch and VLLM, accommodating certain LLMs and quantization types.
  2. Develop New Stress Test Clients: Implement clients that can handle varying input_length to ensure robust testing.
  3. Controller Support: Enhance the controller to manage inference frameworks, input_length, and dtype configurations effectively.

Large Model Service Version Tracking

  1. Define LLM Service Version CRD: Establish a Custom Resource Definition (CRD) for LLM service versions to streamline version management.
  2. Support for LLM Service Versions: Integrate support for LLM service versions in both frontend and backend systems.
  3. Add ArgoCD GitOps Feature: Incorporate GitOps capabilities using ArgoCD for continuous delivery and synchronization of application configurations.

Tasks

  • Adjust configuration files for domestic network compatibility. Develop Docker images for PyTorch and VLLM model deployment. Implement stress test clients with variable input_length support.
  • Establish LLM service version CRD. Add support for LLM service versions in application layers.
  • Integrate ArgoCD for GitOps functionality.

Acceptance Criteria

  • All documentation should be updated and free of errors.
  • Configuration files should be tested and confirmed to work with domestic network settings.
  • Docker images should be built and tested for compatibility with specified frameworks and models.
  • Stress test clients should be able to simulate different input lengths effectively.
  • The controller should be able to manage the specified configurations without issues.
  • LLM service version CRD should be defined and implemented.
  • Frontend and backend should be updated to recognize and support LLM service versions.
  • ArgoCD should be set up and tested for continuous deployment and synchronization.

Comments

  • Please provide any additional information, questions, or concerns related to this issue.
@ZHANGWENTAI ZHANGWENTAI changed the title [Feature] Add Support for Auto-Configuration of Multiple Inference Frameworks [Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant