Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 并发调用创建虚机接口,有几率分配显卡失败 #21508

Open
66545shiwo opened this issue Nov 2, 2024 · 1 comment
Open
Labels
bug Something isn't working state/awaiting processing

Comments

@66545shiwo
Copy link

问题描述/What happened:
并发调用创建虚机接口,有几率分配显卡失败:

221940 [warning 2024-11-02 05:56:50 predicates.(*PredicateHelper).GetResult(predicates.go:89)]Filter Result: candidate: "0a70d90d-f1d5-4dc5-8aaa-0306d88936f9", filter: "host_isolated_device", is_ok: false, reason: "no enough resource: test, requested: 1, total: 8, free: 0, IsolatedDevice count not enough, request: 1, hostTotal: 8, hostFree: 0", error: <nil>

宿主机共8卡, 创建虚机接口并发调用8次,每个虚机分配1卡,报错如上。
调用完成后查看宿主机实际占用7卡,剩余1卡

环境/Environment:
v3.10.15

  • OS (e.g. cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Host: (e.g. dmidecode | egrep -i 'manufacturer|product' |sort -u)
  • Service Version (e.g. kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list):
@66545shiwo 66545shiwo added the bug Something isn't working label Nov 2, 2024
@66545shiwo
Copy link
Author

66545shiwo commented Nov 4, 2024

我们在scheduler的内存/显卡的predicates里加了重试策略,可以简单解决内存(#21301)/显卡并发分配问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working state/awaiting processing
Projects
None yet
Development

No branches or pull requests

1 participant