You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR 11-27 11:41:33 async_llm_engine.py:64] Engine background task failed
ERROR 11-27 11:41:33 async_llm_engine.py:64] Traceback (most recent call last):
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
ERROR 11-27 11:41:33 async_llm_engine.py:64] return_value = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
ERROR 11-27 11:41:33 async_llm_engine.py:64] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] outputs = await self.model_executor.execute_model_async(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self._driver_execute_model_async(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self.driver_exec_model(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = self.fn(*self.args, **self.kwargs)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 11-27 11:41:33 async_llm_engine.py:64] inputs = self.prepare_input(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-27 11:41:33 async_llm_engine.py:64] self.model_runner.prepare_model_input(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] model_input = self._prepare_model_input_tensors(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 11-27 11:41:33 async_llm_engine.py:64] return builder.build() # type: ignore
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] attn_metadata = self.attn_metadata_builder.build(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] input_block_tables[i, :len(block_table)] = block_table
ERROR 11-27 11:41:33 async_llm_engine.py:64] ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,903 xinference.core.model 931290 ERROR [request 7b8ea0e6-ac71-11ef-9b19-fa163ea8cbc1] Leave chat, error: could not broadcast input array from shape (516,) into shape (512,), elapsed time: 6 s
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,909 xinference.api.restful_api 929546 ERROR [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 98, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:34,736 xinference.model.llm.vllm.utils 931290 INFO Detecting vLLM is not health, prepare to quit the process
2024-11-27 11:41:34,736 xinference.model.llm.vllm.core 931290 INFO Stopping vLLM engine
INFO 11-27 11:41:34 multiproc_worker_utils.py:133] Terminating local vLLM worker processes
2024-11-27 11:41:35,202 xinference.api.restful_api 929546 ERROR Remote server 0.0.0.0:36473 closed
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 230, in send
result = await self._wait(future, actor_ref.address, send_message) # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
return await future
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 106, in _wait
await asyncio.shield(future)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
raise ServerClosed(
xoscar.errors.ServerClosed: Remote server 0.0.0.0:36473 closed
2024-11-27 11:41:36,476 xinference.core.worker 929804 WARNING Process 0.0.0.0:36473 is down.
(VllmWorkerProcess pid=931684) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931682) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931683) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
Expected behavior / 期待表现
修复bug
The text was updated successfully, but these errors were encountered:
System Info / 系統信息
4*32g v100
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.16.2
The command used to start Xinference / 用以启动 xinference 的命令
nohup启动
Reproduction / 复现过程
报错过程
vllm部署模型 4卡 之前一直好好的 然后调用会话后出现 第一张卡直接的gpu进程没了 然后其他三张卡模型还挂着 但是调用会话就一直没响应 只能进行重启服务再挂载模型
报错信息如下
ERROR 11-27 11:41:33 async_llm_engine.py:64] Engine background task failed
ERROR 11-27 11:41:33 async_llm_engine.py:64] Traceback (most recent call last):
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
ERROR 11-27 11:41:33 async_llm_engine.py:64] return_value = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
ERROR 11-27 11:41:33 async_llm_engine.py:64] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] outputs = await self.model_executor.execute_model_async(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self._driver_execute_model_async(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self.driver_exec_model(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = self.fn(*self.args, **self.kwargs)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 11-27 11:41:33 async_llm_engine.py:64] inputs = self.prepare_input(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-27 11:41:33 async_llm_engine.py:64] self.model_runner.prepare_model_input(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] model_input = self._prepare_model_input_tensors(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 11-27 11:41:33 async_llm_engine.py:64] return builder.build() # type: ignore
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] attn_metadata = self.attn_metadata_builder.build(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] input_block_tables[i, :len(block_table)] = block_table
ERROR 11-27 11:41:33 async_llm_engine.py:64] ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,903 xinference.core.model 931290 ERROR [request 7b8ea0e6-ac71-11ef-9b19-fa163ea8cbc1] Leave chat, error: could not broadcast input array from shape (516,) into shape (512,), elapsed time: 6 s
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,909 xinference.api.restful_api 929546 ERROR [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 98, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:34,736 xinference.model.llm.vllm.utils 931290 INFO Detecting vLLM is not health, prepare to quit the process
2024-11-27 11:41:34,736 xinference.model.llm.vllm.core 931290 INFO Stopping vLLM engine
INFO 11-27 11:41:34 multiproc_worker_utils.py:133] Terminating local vLLM worker processes
2024-11-27 11:41:35,202 xinference.api.restful_api 929546 ERROR Remote server 0.0.0.0:36473 closed
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 230, in send
result = await self._wait(future, actor_ref.address, send_message) # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
return await future
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 106, in _wait
await asyncio.shield(future)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
raise ServerClosed(
xoscar.errors.ServerClosed: Remote server 0.0.0.0:36473 closed
2024-11-27 11:41:36,476 xinference.core.worker 929804 WARNING Process 0.0.0.0:36473 is down.
(VllmWorkerProcess pid=931684) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931682) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931683) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
Expected behavior / 期待表现
修复bug
The text was updated successfully, but these errors were encountered: