Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding second container doesn't work for HA due to order of operations and incorrect remote access command #4

Open
mdrakiburrahman opened this issue Jun 5, 2021 · 2 comments

Comments

@mdrakiburrahman
Copy link

mdrakiburrahman commented Jun 5, 2021

When adding a second container with ENABLE_HA and HA_PORT specified, the container doesn't correctly get registered and also fails the health check:
image

I found the following fixes the issue:

  1. Use -EnableRemoteAccessInContainer instead of EnableRemoteAccess
  2. Order of operation matters on the Second Node, need to run -EnableRemoteAccessInContainer first, before -RegisterNewNode is executed

With this setup - the second node is successful:
image

Enables HA:
image

@curtis-durrett
Copy link

I made the changes in this PR however the high availability still does not work.

This is the log for the first node
[01/06/2022 06:53:21] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc=
[01/06/2022 06:53:21] Registering SHIR with the node name: Node_1
[01/06/2022 06:53:21] Registering SHIR with the enable high availability flag: true
[01/06/2022 06:53:21] Registering SHIR with the tcp port: 8060
[01/06/2022 06:53:33] Start registering the new SHIR node
[01/06/2022 06:53:33] Enable High Availability
[01/06/2022 06:53:39] High Availability Enabled
[01/06/2022 06:53:39] Registering New Node Node_1
[01/06/2022 06:53:50] Node Node_1 Registered
[01/06/2022 06:53:50] Waiting 180 seconds before attempting first health check
[01/06/2022 06:56:50] Check-Main-Process return true
[01/06/2022 06:56:53] Check-Node-Connection return true ConnectionResult: Connected
[01/06/2022 06:56:53] Node Health Check Pass

This is the log for the second node
[01/06/2022 07:34:08] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc=
[01/06/2022 07:34:08] Registering SHIR with the node name: Node_2
[01/06/2022 07:34:08] Registering SHIR with the enable high availability flag: true
[01/06/2022 07:34:08] Registering SHIR with the tcp port: 8060
[01/06/2022 07:34:20] Start registering the new SHIR node
[01/06/2022 07:34:20] Enable High Availability
[01/06/2022 07:34:25] High Availability Enabled
[01/06/2022 07:34:25] Registering New Node Node_2
[01/06/2022 07:34:36] Node Node_2 Registered
[01/06/2022 07:34:36] Waiting 180 seconds before attempting first health check
[01/06/2022 07:37:36] Check-Main-Process return true
[01/06/2022 07:37:39] Check-Node-Connection error Node is offline ConnectionResult: Connecting
[01/06/2022 07:37:39] Node Health Check Failed
[01/06/2022 07:37:39] Stop the node connection
[01/06/2022 07:37:56] Stop the node connection successfully

Notice that the connection result for Node_2 show as Connecting

@curtis-durrett
Copy link

IR_Status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants