new grpc server and client for new knobs. #54

ZHANGWENTAI · 2024-10-29T06:42:20Z

New grpc server and client for tuning inference framework, dtype, input length and batch size on LLMs from huggingface.

Ⅰ. What this PR does / why we need it

can specify LLMs on huggingface and download
new grpc server for multiple inference frameworks: native torch and vllm
int8 quantization with bitsandbytes
new grpc client for stress test under different input lengths and batch sizes
example for tuning new knobs: inference framework, dtype, input length and batch size

II. Does this pull request fix one issue?

implement the first part of #52

…ut length and batch size on LLMs from huggingface Signed-off-by: ZHANGWENTAI <[email protected]>

ZHANGWENTAI changed the title ~~New grpc server and client for new knobs.~~ new grpc server and client for new knobs. Oct 31, 2024

new grpc server and client for tuning inference framework, dtype, inp…

26ecbf8

…ut length and batch size on LLMs from huggingface Signed-off-by: ZHANGWENTAI <[email protected]>

ZHANGWENTAI force-pushed the multi_framework_server branch from c0b9b73 to 26ecbf8 Compare October 31, 2024 11:50

SimonCqk merged commit 1fb8e3b into kubedl-io:main Oct 31, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new grpc server and client for new knobs. #54

new grpc server and client for new knobs. #54

ZHANGWENTAI commented Oct 29, 2024

new grpc server and client for new knobs. #54

new grpc server and client for new knobs. #54

Conversation

ZHANGWENTAI commented Oct 29, 2024

Ⅰ. What this PR does / why we need it

II. Does this pull request fix one issue?