Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new grpc server and client for new knobs. #54

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

ZHANGWENTAI
Copy link
Contributor

New grpc server and client for tuning inference framework, dtype, input length and batch size on LLMs from huggingface.

Ⅰ. What this PR does / why we need it

  • can specify LLMs on huggingface and download
  • new grpc server for multiple inference frameworks: native torch and vllm
  • int8 quantization with bitsandbytes
  • new grpc client for stress test under different input lengths and batch sizes
  • example for tuning new knobs: inference framework, dtype, input length and batch size

II. Does this pull request fix one issue?

implement the first part of #52

@ZHANGWENTAI ZHANGWENTAI changed the title New grpc server and client for new knobs. new grpc server and client for new knobs. Oct 31, 2024
…ut length and batch size on LLMs from huggingface

Signed-off-by: ZHANGWENTAI <[email protected]>
@SimonCqk SimonCqk merged commit 1fb8e3b into kubedl-io:main Oct 31, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants