You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-02-23 15:54:44.147][2184239][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush
[2024-02-23 15:55:08.725][2184301][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush
[2024-02-23 15:55:33.291][2184634][serving][error][modelinstance.cpp:1193] Async caught an exception Internal inference error: Exception from src/inference/src/infer_request.cpp:256:
clFlush
kernel logs
[1331510.701350] i915 0000:03:00.0: [drm] GPU HANG: ecode 12:10:85def5fa, in ovms [809080]
[1331510.701372] i915 0000:03:00.0: [drm] ovms[809080] context reset due to GPU hang
[1331516.943270] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f0e!
[1331517.368428] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f16!
[1331517.543874] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f18!
[1331531.202434] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1a!
[1331531.204035] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1c!
[1331531.204263] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f1e!
[1331531.204844] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f20!
[1331531.210043] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f22!
[1331531.210182] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f24!
[1331531.212604] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f26!
[1331531.212840] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f28!
[1331531.214194] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2a!
[1331531.214293] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2e!
[1331531.214379] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f2c!
[1331531.218911] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f30!
[1331531.224320] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f32!
[1331531.224845] Fence expiration time out i915-0000:03:00.0:ovms[809080]:24f36!
Configuration
OpenVINO Model Server 2023.3.4e91aac76
OpenVINO backend 2023.3.0.13775.ceeafaf64f3
Bazel build flags: --strip=always --define MEDIAPIPE_DISABLE=0 --cxxopt=-DMEDIAPIPE_DISABLE=0 --define PYTHON_DISABLE=1 --cxxopt=-DPYTHON_DISABLE=1
The text was updated successfully, but these errors were encountered:
My environment is pretty complicated...
My Host server uses Debian, and i915 kernel driver. I passthrough the GPU to LXC container that installed ubuntu 22.04 Intel GPU dependencies.
And I run multiple models on a single GPU (I tweaked the compute runtime parameter to use Multi-CCS Modes which should be helpful), each model is part of a inference pipeline. When the pipeline goes through high loads, sometimes model server hangs.
Describe the bug
Inference hangs when using A770
Logs
server logs
kernel logs
Configuration
OpenVINO Model Server 2023.3.4e91aac76
OpenVINO backend 2023.3.0.13775.ceeafaf64f3
Bazel build flags: --strip=always --define MEDIAPIPE_DISABLE=0 --cxxopt=-DMEDIAPIPE_DISABLE=0 --define PYTHON_DISABLE=1 --cxxopt=-DPYTHON_DISABLE=1
The text was updated successfully, but these errors were encountered: