Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

啥时候适配vllm框架呢? #144

Open
xxm1668 opened this issue Dec 6, 2023 · 3 comments
Open

啥时候适配vllm框架呢? #144

xxm1668 opened this issue Dec 6, 2023 · 3 comments

Comments

@xxm1668
Copy link

xxm1668 commented Dec 6, 2023

No description provided.

@i4never
Copy link
Contributor

i4never commented Dec 7, 2023

TigerBot模型基于llama-2架构,vllm适配了 meta-llama/Llama-2-70b-hf 架构,可以参考vllm的quickstart。
另外,想适配vllm一般是有serve模型的需求,可以考虑使用TGI
TGI中的fast_llama_modeling中集成了flash_attn与vllm,第一个token的生成使用了flash_attn,后续token使用了vllm。
https://github.com/huggingface/text-generation-inference/blob/3238c49121b02432bf2938c6ebfd44f06c5adc2f/server/text_generation_server/models/custom_modeling/flash_llama_modeling.py#L291-L313

@i4never i4never closed this as completed Dec 7, 2023
@i4never i4never reopened this Dec 7, 2023
@AlphaINF
Copy link

您好,我在前几天实现了tigerbot模型适配vllm,主要是适配了jinja脚本的前缀,您可以来看我这一篇博文:https://www.cnblogs.com/alphainf/p/17884055.html

@xxm1668
Copy link
Author

xxm1668 commented Dec 18, 2023

谢谢大佬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants