You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are aiming to improve the deployment and management of large models by addressing the following key areas:
Prerequisites
Fix documentation errors.
Adjust configuration files to adapt to the domestic network environment.
Large Model Auto-Configuration Tuning
Create Multiple Inference Framework Images: Develop Docker images that support the deployment of models using frameworks like PyTorch and VLLM, accommodating certain LLMs and quantization types.
Develop New Stress Test Clients: Implement clients that can handle varying input_length to ensure robust testing.
Controller Support: Enhance the controller to manage inference frameworks, input_length, and dtype configurations effectively.
Large Model Service Version Tracking
Define LLM Service Version CRD: Establish a Custom Resource Definition (CRD) for LLM service versions to streamline version management.
Support for LLM Service Versions: Integrate support for LLM service versions in both frontend and backend systems.
Add ArgoCD GitOps Feature: Incorporate GitOps capabilities using ArgoCD for continuous delivery and synchronization of application configurations.
Tasks
Adjust configuration files for domestic network compatibility. Develop Docker images for PyTorch and VLLM model deployment. Implement stress test clients with variable input_length support.
Establish LLM service version CRD. Add support for LLM service versions in application layers.
Integrate ArgoCD for GitOps functionality.
Acceptance Criteria
All documentation should be updated and free of errors.
Configuration files should be tested and confirmed to work with domestic network settings.
Docker images should be built and tested for compatibility with specified frameworks and models.
Stress test clients should be able to simulate different input lengths effectively.
The controller should be able to manage the specified configurations without issues.
LLM service version CRD should be defined and implemented.
Frontend and backend should be updated to recognize and support LLM service versions.
ArgoCD should be set up and tested for continuous deployment and synchronization.
Comments
Please provide any additional information, questions, or concerns related to this issue.
The text was updated successfully, but these errors were encountered:
ZHANGWENTAI
changed the title
[Feature] Add Support for Auto-Configuration of Multiple Inference Frameworks
[Summer-OSPP] LLM Inference Auto-config & LLM Service Version Tracing
Oct 9, 2024
Description
We are aiming to improve the deployment and management of large models by addressing the following key areas:
Prerequisites
Large Model Auto-Configuration Tuning
input_length
to ensure robust testing.input_length
, anddtype
configurations effectively.Large Model Service Version Tracking
Tasks
input_length
support.Acceptance Criteria
Comments
The text was updated successfully, but these errors were encountered: