Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大模型(vllm引擎推理)运行时间久了显存会增加且无法下降 #2639

Open
1 of 3 tasks
turndown opened this issue Dec 9, 2024 · 2 comments
Open
1 of 3 tasks
Labels
Milestone

Comments

@turndown
Copy link

turndown commented Dec 9, 2024

System Info / 系統信息

操作系统"openEuler 20.03 (LTS-SP3)
Cuda V12.5.40
conda虚拟环境Python 3.11.9
transformers 4.46.3
vllm 0.6.4.post1

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

Release: v0.15.3+1.g26b5097

The command used to start Xinference / 用以启动 xinference 的命令

HF_ENDPOINT=https://hf-mirror.com XINFERENCE_HOME=/data/inference/.xinference GRADIO_DEFAULT_CONCURRENCY_LIMIT=10 XINFERENCE_MODEL_SRC=modelscope nohup xinference-local --host 0.0.0.0 --port 30002 --log-level debug > output.log 2>&1 &

Reproduction / 复现过程

运行方式如下,没有加额外的参数:
image
最开始启动大模型后占用显存卡2卡3各32g左右:
image
但是运行了几天并通过dify调用后,显存从32g逐渐上升到37g,现在到了39g,并且没有下降的现象:
image

有几个问题请教下:
1.大模型加载后只是初步加载显存,后续使用是不是随着并发量的增多,显存也会波动?还是说一直会上升?
2.显存波动上升的范围能否做限制,如果没限制是不是最后会溢出崩溃,如何让显存能降回来?

Expected behavior / 期待表现

调用少了之后,显存回归正常状态。

@XprobeBot XprobeBot added the gpu label Dec 9, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Dec 9, 2024
@turndown
Copy link
Author

help,没有人遇到这类似的问题吗0.0

@948024326
Copy link

help,没有人遇到这类似的问题吗0.0

请问有方法解决嘛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants