Skip to content

Does aphrodite-engine support models quantized with vllm's llm-compressor? #563

Answered by AlpinDale
BlairSadewitz asked this question in Q&A
Discussion options

You must be logged in to vote

Not at the moment, no. But I'm currently backporting all the missing vllm features in rc_054 branch. I'm currently about 2 months behind so it'll take a week or so. In the meantime, I've added support for 4, 6, 8, and 12 bits from deepspeedfp in that branch. A later update will add support for 5, 6, 7, and a better 8bit, with good performance at higher batch sizes. You can use deepspeedfp by specifying -q deepspeedfp --num-deepspeedfp-bits {4,6,8,12} when launching a 16bit model. Disclaimer: its rather slow.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by BlairSadewitz
Comment options

You must be logged in to vote
1 reply
@BlairSadewitz
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants