报错 No available block found in 60 second. #2594

948024326 · 2024-11-27T03:45:30Z

System Info / 系統信息

4*32g v100

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

0.16.2

The command used to start Xinference / 用以启动 xinference 的命令

nohup启动

Reproduction / 复现过程

报错过程
vllm部署模型 4卡之前一直好好的然后调用会话后出现第一张卡直接的gpu进程没了然后其他三张卡模型还挂着但是调用会话就一直没响应只能进行重启服务再挂载模型

报错信息如下

ERROR 11-27 11:41:33 async_llm_engine.py:64] Engine background task failed
ERROR 11-27 11:41:33 async_llm_engine.py:64] Traceback (most recent call last):
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
ERROR 11-27 11:41:33 async_llm_engine.py:64] return_value = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
ERROR 11-27 11:41:33 async_llm_engine.py:64] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] outputs = await self.model_executor.execute_model_async(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self._driver_execute_model_async(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self.driver_exec_model(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = self.fn(*self.args, **self.kwargs)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 11-27 11:41:33 async_llm_engine.py:64] inputs = self.prepare_input(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-27 11:41:33 async_llm_engine.py:64] self.model_runner.prepare_model_input(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] model_input = self._prepare_model_input_tensors(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 11-27 11:41:33 async_llm_engine.py:64] return builder.build() # type: ignore
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] attn_metadata = self.attn_metadata_builder.build(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] input_block_tables[i, :len(block_table)] = block_table
ERROR 11-27 11:41:33 async_llm_engine.py:64] ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,903 xinference.core.model 931290 ERROR [request 7b8ea0e6-ac71-11ef-9b19-fa163ea8cbc1] Leave chat, error: could not broadcast input array from shape (516,) into shape (512,), elapsed time: 6 s
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,909 xinference.api.restful_api 929546 ERROR [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 98, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)

2024-11-27 11:41:34,736 xinference.model.llm.vllm.utils 931290 INFO Detecting vLLM is not health, prepare to quit the process
2024-11-27 11:41:34,736 xinference.model.llm.vllm.core 931290 INFO Stopping vLLM engine
INFO 11-27 11:41:34 multiproc_worker_utils.py:133] Terminating local vLLM worker processes
2024-11-27 11:41:35,202 xinference.api.restful_api 929546 ERROR Remote server 0.0.0.0:36473 closed
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 230, in send
result = await self._wait(future, actor_ref.address, send_message) # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
return await future
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 106, in _wait
await asyncio.shield(future)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
raise ServerClosed(
xoscar.errors.ServerClosed: Remote server 0.0.0.0:36473 closed
2024-11-27 11:41:36,476 xinference.core.worker 929804 WARNING Process 0.0.0.0:36473 is down.
(VllmWorkerProcess pid=931684) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931682) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931683) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.

Expected behavior / 期待表现

修复bug

948024326 · 2024-11-27T03:45:45Z

@qinxuye

qinxuye · 2024-11-27T07:55:40Z

vLLM 似乎跑挂了，他有自动重启吗？

948024326 · 2024-11-27T08:14:38Z

vLLM 似乎跑挂了，他有自动重启吗？

没有重启我现在暂时解决了，首先是把版本从0.16.2升级到了0.16.3，然后再把以前下过的模型文件删了重新再下了次，那个报错维度错误感觉像是文件出的问题，但是我也没有动过，很奇怪

qinxuye · 2024-11-27T10:02:58Z

可能还是有问题的，vllm 引擎应该已经死掉了，不过自动杀掉重启的机制没生效，可能还要解决下。

XprobeBot added the gpu label Nov 27, 2024

XprobeBot added this to the v1.x milestone Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

报错 No available block found in 60 second. #2594

报错 No available block found in 60 second. #2594

948024326 commented Nov 27, 2024

948024326 commented Nov 27, 2024

qinxuye commented Nov 27, 2024

948024326 commented Nov 27, 2024

qinxuye commented Nov 27, 2024

报错 No available block found in 60 second. #2594

报错 No available block found in 60 second. #2594

Comments

948024326 commented Nov 27, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

948024326 commented Nov 27, 2024

qinxuye commented Nov 27, 2024

948024326 commented Nov 27, 2024

qinxuye commented Nov 27, 2024