啥时候适配vllm框架呢？ #144

xxm1668 · 2023-12-06T09:41:23Z

No description provided.

i4never · 2023-12-07T02:44:51Z

TigerBot模型基于llama-2架构，vllm适配了 meta-llama/Llama-2-70b-hf 架构，可以参考vllm的quickstart。
另外，想适配vllm一般是有serve模型的需求，可以考虑使用TGI，
TGI中的fast_llama_modeling中集成了flash_attn与vllm，第一个token的生成使用了flash_attn，后续token使用了vllm。
https://github.com/huggingface/text-generation-inference/blob/3238c49121b02432bf2938c6ebfd44f06c5adc2f/server/text_generation_server/models/custom_modeling/flash_llama_modeling.py#L291-L313

AlphaINF · 2023-12-17T16:32:14Z

您好，我在前几天实现了tigerbot模型适配vllm，主要是适配了jinja脚本的前缀，您可以来看我这一篇博文：https://www.cnblogs.com/alphainf/p/17884055.html

xxm1668 · 2023-12-18T00:58:17Z

谢谢大佬

i4never closed this as completed Dec 7, 2023

i4never reopened this Dec 7, 2023

Provide feedback