-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmcompute issue with latest patch - encountered an error during hcs::System::Start: context deadline exceeded related to (Splunk Universal Forwarder) #426
Comments
I am trying to repro the issue: PS C:\Users\Administrator> docker run jenkins/agent:jdk17-windowsservercore-ltsc2022 powershell.exe -c echo hello Source Description HotFixID InstalledBy InstalledOn WIN-35ILNJ... Update KB5022507 5/5/2023 12:00:00 AM Can you call get-hotfix to see the packages that installed on your system? Thanks, |
when the server gets deployed (without updates) it works and has the following. get-hotfix Source Description HotFixID InstalledBy InstalledOn #----------------------------------------- Package Identity : Package_for_RollupFix Package Identity : Package_for_ServicingStack_1663 #----------------------------------------- Then as part of the deployment process when we install the latest patch it breaks. Get-HotFix Source Description HotFixID InstalledBy InstalledOn
I have tested on Server Core and with GUI and same result. Based on the other thread it likely wasn't this latest roll up or patch that had the breaking change... however I haven't been able to test on different patch versions. Also when I uninstalled this update the server was still broken, I noticed vmcompute.exe was an even earlier version in this case (after uninstall). I am using the exact same process to deploy all of the servers so they are identical other than the working servers did not have the updates applied at build time. Then I noticed the vmcomute.exe did have some changes between the versions. we should test with this since it's a smaller image and straight from mcr... not that it makes that much difference. docker run mcr.microsoft.com/windows/nanoserver:ltsc2022 ping localhost -r 5 also This is the patch to install top test a repro... |
@Howard-Haiyang-Hao also thank you for the follow up. If there are some lower level tests or logs that we can look into, please let us know. |
Adding below log from the Application logs: 6:25:02PM - Warning 6:25:02PM - Error |
Hi @Howard-Haiyang-Hao were you able to get the update installed? Any luck with the repro? |
Adding some further information.
Event logswarningSyscall did not complete within operation timeout. This may indicate a platform issue. If it appears to be making no forward progress, obtain the stacks and see if there is a syscall stuck in the platform API for a significant length of time. [spanID=be1a8d4cdaa4cc79 traceID=3c3d99e94347624bac4b3aa9b63f8d8b timeout=4m0s] errorsfailed to start container [module=libcontainerd namespace=moby container=62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8 error=container 62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8 encountered an error during hcs::System::Start: context deadline exceeded] failed to cleanup after a failed Start [module=libcontainerd namespace=moby container=62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8 error=container 62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8 encountered an error during hcs::System::Terminate: context deadline exceeded] Handler for POST /v1.43/containers/62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8/start returned error: container 62f69d70bdfc7a8bcb86e4d824221cb3ed0381e48fb2f6b2aebefab59a12a7b8 encountered an error during hcs::System::Start: context deadline exceeded OSinfoWindowsBuildLabEx : 20348.1.amd64fre.fe_release.210507-1500
WindowsCurrentVersion : 6.3
WindowsEditionId : ServerStandard
WindowsInstallationType : Server
WindowsInstallDateFromRegistry : 10/2/2023 3:38:09 PM
WindowsProductId : 00454-10000-00001-AA777
WindowsProductName : Windows Server 2022 Standard
OsName : Microsoft Windows Server 2022 Standard
OsType : WINNT
OsOperatingSystemSKU : StandardServerEdition
OsVersion : 10.0.20348
get-hotfix | ft -AutoSize
Source Description HotFixID InstalledBy InstalledOn
------ ----------- -------- ----------- -----------
BWILKINSON-VM1 Update KB5029928 NT AUTHORITY\SYSTEM 10/2/2023 12:00:00 AM
BWILKINSON-VM1 Security Update KB5012170 NT AUTHORITY\SYSTEM 10/2/2023 12:00:00 AM
BWILKINSON-VM1 Security Update KB5030216 NT AUTHORITY\SYSTEM 10/2/2023 12:00:00 AM
BWILKINSON-VM1 Security Update KB5025314 4/5/2023 12:00:00 AM
BWILKINSON-VM1 Update KB5030369 NT AUTHORITY\SYSTEM 10/2/2023 12:00:00 AM |
Today I installed the base image that had been tested as working. A fresh image, not the "corporate" image, via a standard deployment. Then installed each monthly patch, and tested each iteration of the updates. SW_DVD9_Win_Server_STD_CORE_2022_2108.20_64Bit_English_DC_STD_MLF_X23-42932 C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1668_none_a001fbb8ddff3a5d\vmcompute.exe C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1726_none_9ff35834de0abdb0\vmcompute.exe C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1787_none_9ff94328de056f5b\vmcompute.exe C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1850_none_9fec11d6de0f8be0\vmcompute.exe C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1906_none_9fdbb7c8de1cc2e4\vmcompute.exe C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1970_none_9fe3427ede15da7f\vmcompute.exe I was not able to reproduce the error. I am continuing to investigate. I am leaning towards some "scanning" and "security" software as a possible cause, however will update once I have completed more testing. |
I am out of cycle for deploying more nodes right now, however I expect to continue doing so in the next few days. Once I have more data, I will provide a further update. |
I will close this for now. At this time, I don't have a root cause... however I will come back later and add notes in case it's useful for others. I do have a way to make the vmcompute start working again... however I am just not definitive on exactly why or which single things breaks the service. 1 time restarting vmcompute (and then rebooting) brought the containers online I can only assume that the vmcompute is sensitive to something, however I am unsure what exactly OR why uninstalling the apps allowed the containers to start as normal. Also reinstalling the same combination of apps did not break the container services. So I have a workaround, however not any sure solution on what exactly the issue is or was. I will provide an update in the future. |
In case it's helpful for others... In our environment I have been able to reproduce the issue many times.
I have not seen any regression on this afterwards. I have a Guess that it maybe related to the network setup for the "Microsoft-Windows-Hyper-V-VmSwitch" etc.
If I have more time I will continue to capture these states as I deploy more new resources in the future. |
@Howard-Haiyang-Hao @akarshm just a ping for this issue, which I closed, however is worth follow up in: |
@brwilkinson , Any further issues? Reoccurrences? |
This is being tracked over on |
#------------------------------------------------------------------
older (unpatched) server - working
hostname
docker info
Server: Docker Engine - Community
Engine:
Version: 24.0.6
gi C:\windows\System32\vmcompute.exe | foreach Target
C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1906_none_9fdbb7c8de1cc2e4\vmcompute.exe
vmcompute.exe is 10.0.20348.1906
docker run jenkins/agent:jdk17-windowsservercore-ltsc2022 pwsh -c echo hello
✅
#------------------------------------------------------------------
new server (patched) - not working
hostname
docker info
Server: Docker Engine - Community
Engine:
Version: 24.0.6
gi C:\windows\System32\vmcompute.exe | foreach Target
C:\Windows\WinSxS\amd64_hyperv-compute-host-service_31bf3856ad364e35_10.0.20348.1970_none_9fe3427ede15da7f\vmcompute.exe
vmcomute.exe is 10.0.20348.1906
docker run jenkins/agent:jdk17-windowsservercore-ltsc2022 pwsh -c echo hello
encountered an error during hcs::System::Start: context deadline exceeded. 🟧
The only difference i can see is the build version of the vmcompute.exe ?
Which was updated as part of this:
https://support.microsoft.com/en-us/topic/september-12-2023-kb5030216-os-build-20348-1970-34d4aff3-fd05-4270-b288-4ab6379c7f81
The text was updated successfully, but these errors were encountered: