Does aphrodite-engine support models quantized with vllm's llm-compressor? #563

BlairSadewitz · 2024-07-31T22:47:16Z

BlairSadewitz
Jul 31, 2024

For example, could I use it to quantize a model with smoothquant to do 8-bit lossless inference later?

Jul 31, 2024

Not at the moment, no. But I'm currently backporting all the missing vllm features in rc_054 branch. I'm currently about 2 months behind so it'll take a week or so. In the meantime, I've added support for 4, 6, 8, and 12 bits from deepspeedfp in that branch. A later update will add support for 5, 6, 7, and a better 8bit, with good performance at higher batch sizes. You can use deepspeedfp by specifying -q deepspeedfp --num-deepspeedfp-bits {4,6,8,12} when launching a 16bit model. Disclaimer: its rather slow.

View full answer

AlpinDale · 2024-07-31T22:50:50Z

AlpinDale
Jul 31, 2024
Maintainer

Not at the moment, no. But I'm currently backporting all the missing vllm features in rc_054 branch. I'm currently about 2 months behind so it'll take a week or so. In the meantime, I've added support for 4, 6, 8, and 12 bits from deepspeedfp in that branch. A later update will add support for 5, 6, 7, and a better 8bit, with good performance at higher batch sizes. You can use deepspeedfp by specifying -q deepspeedfp --num-deepspeedfp-bits {4,6,8,12} when launching a 16bit model. Disclaimer: its rather slow.

0 replies

AlpinDale · 2024-08-12T18:27:19Z

AlpinDale
Aug 12, 2024
Maintainer

There's some support for llm-compressor models in the rc_054 branch now, but may not be completely up-to-date. I'm adding more of their quant schemes as I go through the library.

1 reply

BlairSadewitz Aug 29, 2024
Author

Didn't see you answered this, sorry. Yeah, I had used llm-compressor a few days ago, but it had some issues.

BTW, does the EETQ support work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does aphrodite-engine support models quantized with vllm's llm-compressor? #563

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Does aphrodite-engine support models quantized with vllm's llm-compressor? #563

BlairSadewitz Jul 31, 2024

Replies: 2 comments · 1 reply

AlpinDale Jul 31, 2024 Maintainer

AlpinDale Aug 12, 2024 Maintainer

BlairSadewitz Aug 29, 2024 Author

BlairSadewitz
Jul 31, 2024

Replies: 2 comments 1 reply

AlpinDale
Jul 31, 2024
Maintainer

AlpinDale
Aug 12, 2024
Maintainer

BlairSadewitz Aug 29, 2024
Author