-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
开源的模型里哪个是100K上下文的版本? #178
Comments
v6两个模型都支持100k, config里的数值只是placeholder, 我们测试用8x 40g 卡支持到总长度100k,没有问题,如果8x80g, 可支持200k,如下命令可测试: 可以根据实际硬件情况调整max_input/generate_lengthexport PYTHONPATH='./' ; export CUDA_VISIBLE_DEVICES=0 ; streamlit run apps/web_demo.py -- --model_path tigerbot-70b-chat-v6 --rope_scaling yarn --rope_factor 8 --max_input_length 37888 --max_generate_length 62112 |
我这里只有9块3090的显卡,请问有v6量化版的么?量化版的是不是可以支持100K? |
70b chat v6的量化版本在这里:https://huggingface.co/TigerResearch/tigerbot-70b-chat-v6-4bit-exl2 |
谢谢,4bit量化的也可以推理100K的上下文吧? |
可以的 |
请问exllama2量化的模型用什么框架推理可以使用api接口? |
量化模型使用下面参数启动模型 在长文本推理的时候会报下面的错误 |
13B模型使用下面参数启动模型 在长文本推理的时候会报下面的错误 Namespace(model_path='/data/model/tigerbot-13b-chat-v6', rope_scaling='yarn', rope_factor=8.0, max_input_length=10240, max_generate_length=10240) |
开源的模型里哪个是100K上下文的版本?我看了最新的v6版本好像是只有8k吧?
The text was updated successfully, but these errors were encountered: