Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple SHIR containers with cascading failures - 0x80010002 (RPC_E_CALL_CANCELED)) #7

Open
nickcva opened this issue Sep 22, 2022 · 1 comment

Comments

@nickcva
Copy link

nickcva commented Sep 22, 2022

We are running several windows SHIR containers on the same physical machines all containers are using the same network and default nat docker switch. Once one container is unhealthy it starts to slowly cascade to the rest of the SHIR containers. We do not use a proxy and network issues are not occurring between the onprem and Azure ADF/Synapse instance.

Are their any issues with running multiple SHIR containers on the same host that all connect to different Azure ADF/Synapse instances? We have the need to scale this out to hundreds of SHIR containers.

Server 2019 Standard 1809 build 17763.3406

Dockerfile is latest with this addtion:
RUN MD C:\Download ADD https://github.com/adoptium/temurin8-binaries/releases/download/jdk8u345-b01/OpenJDK8U-jdk_x64_windows_hotspot_8u345b01.zip C:/Download RUN MD "C:\Program Files\Eclipse Adoptium\jdk8u345-b01" RUN tar -xf C:/Download/OpenJDK8U-jdk_x64_windows_hotspot_8u345b01.zip -C "C:\Program Files\Eclipse Adoptium" RUN SETX PATH "%PATH%;C:\Program Files\Eclipse Adoptium\jdk8u345-b01\bin;C:\Program Files\Eclipse Adoptium\jdk8u345-b01\jre\bin\server" /m RUN SETX JAVA_HOME "C:\Program Files\Eclipse Adoptium\jdk8u345-b01\" /m

image

The only docker warning that is logged on the host server:
Health check for container 39fbbf4f690da051145d18f9d4df16b6666108c76dd39cf73d177179bf961f60 error: context deadline exceeded

This show up on all the containers that are unhealthy
`[09/22/2022 12:23:08] Registering SHIR node with the node key: redacted@ServiceEndpoint=usgovva.frontend.datamovement.azure.us@Vredacted

[09/22/2022 12:23:09] Registering SHIR node with the node name: redacted
[09/22/2022 12:23:09] Registering SHIR node with the enable high availability flag: true

[09/22/2022 12:23:09] Registering SHIR node with the tcp port: 8060

[09/22/2022 12:25:54] Start registering a new SHIR node

[09/22/2022 12:25:54] Enable High Availability

[09/22/2022 12:25:54] Remote Access Port: 8060

[09/22/2022 12:31:59] Waiting 60 seconds for connecting

Get-WmiObject : Call was canceled by the message filter. (Exception from

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Get-WmiObject], COMExcept

    ion

    • FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands

    .GetWmiObjectCommand

[09/22/2022 12:34:02] diahost.exe is not running

Get-WmiObject : Call was canceled by the message filter. (Exception from

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Get-WmiObject], COMExcept

    ion

    • FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands

    .GetWmiObjectCommand

[09/22/2022 12:36:06] diahost.exe is not running

Get-WmiObject : Call was canceled by the message filter. (Exception from

[09/22/2022 12:38:09] diahost.exe is not running

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Get-WmiObject], COMExcept

    ion

    • FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands

    .GetWmiObjectCommand

Get-WmiObject : Call was canceled by the message filter. (Exception from

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Get-WmiObject], COMExcept

    ion

    • FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands

    .GetWmiObjectCommand

[09/22/2022 12:40:11] diahost.exe is not running

Get-WmiObject : Call was canceled by the message filter. (Exception from

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Get-WmiObject], COMExcept

    ion

    • FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands

    .GetWmiObjectCommand

[09/22/2022 12:42:12] diahost.exe is not running

Get-WmiObject : Call was canceled by the message filter. (Exception from

HRESULT: 0x80010002 (RPC_E_CALL_CANCELED))

At C:\SHIR\setup.ps1:17 char:22

  • ... essResult = Get-WmiObject Win32_Process -Filter "name = 'diahost.exe' ...

  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    

[09/22/2022 12:44:12] diahost.exe is not running

+ CategoryInfo          : InvalidOperation: (:) [Get-WmiObject], COMExcept 

ion

+ FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands 

.GetWmiObjectCommand`

@nickcva
Copy link
Author

nickcva commented Feb 14, 2024

I found a work around to allow mass deployments of SHIR containers. We are currently running about 80 SHIR containers.

Use "--isolation=hyperv " in your docker run command.

docker run -d --isolation=hyperv --restart unless-stopped --name="name" -e NODE_NAME="name" -e AUTH_KEY="key" -e ENABLE_HA=false -e HA_PORT=8060 -e ENABLE_AE=false -e AE_TIME=600 "someimage:latest"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant