We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
操作系统"openEuler 20.03 (LTS-SP3) Cuda V12.5.40 conda虚拟环境Python 3.11.9 transformers 4.46.3 vllm 0.6.4.post1
Release: v0.15.3+1.g26b5097
HF_ENDPOINT=https://hf-mirror.com XINFERENCE_HOME=/data/inference/.xinference GRADIO_DEFAULT_CONCURRENCY_LIMIT=10 XINFERENCE_MODEL_SRC=modelscope nohup xinference-local --host 0.0.0.0 --port 30002 --log-level debug > output.log 2>&1 &
运行方式如下,没有加额外的参数: 最开始启动大模型后占用显存卡2卡3各32g左右: 但是运行了几天并通过dify调用后,显存从32g逐渐上升到37g,现在到了39g,并且没有下降的现象:
有几个问题请教下: 1.大模型加载后只是初步加载显存,后续使用是不是随着并发量的增多,显存也会波动?还是说一直会上升? 2.显存波动上升的范围能否做限制,如果没限制是不是最后会溢出崩溃,如何让显存能降回来?
调用少了之后,显存回归正常状态。
The text was updated successfully, but these errors were encountered:
help,没有人遇到这类似的问题吗0.0
Sorry, something went wrong.
请问有方法解决嘛
No branches or pull requests
System Info / 系統信息
操作系统"openEuler 20.03 (LTS-SP3)
Cuda V12.5.40
conda虚拟环境Python 3.11.9
transformers 4.46.3
vllm 0.6.4.post1
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
Release: v0.15.3+1.g26b5097
The command used to start Xinference / 用以启动 xinference 的命令
HF_ENDPOINT=https://hf-mirror.com XINFERENCE_HOME=/data/inference/.xinference GRADIO_DEFAULT_CONCURRENCY_LIMIT=10 XINFERENCE_MODEL_SRC=modelscope nohup xinference-local --host 0.0.0.0 --port 30002 --log-level debug > output.log 2>&1 &
Reproduction / 复现过程
运行方式如下,没有加额外的参数:
最开始启动大模型后占用显存卡2卡3各32g左右:
但是运行了几天并通过dify调用后,显存从32g逐渐上升到37g,现在到了39g,并且没有下降的现象:
有几个问题请教下:
1.大模型加载后只是初步加载显存,后续使用是不是随着并发量的增多,显存也会波动?还是说一直会上升?
2.显存波动上升的范围能否做限制,如果没限制是不是最后会溢出崩溃,如何让显存能降回来?
Expected behavior / 期待表现
调用少了之后,显存回归正常状态。
The text was updated successfully, but these errors were encountered: