-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary cannot stay connected to worker behind a load balancer #732
Comments
Hi there! Thank you for the very detailed issue report. Having read everything, I think this is the key part right here:
It is pretty clear to me from this log excerpt that something is closing the WebSocket connections that Cronicle needs to stay alive between the primary and worker servers. This is likely going to be security related, i.e. some kind of security software, or perhaps a setting in your load balancer or proxy software. Some load balancers / proxies do not support WebSocket connections unless you explicitly allow them. See this recent issue regarding nginx: #535 I also recall someone else reporting this recently: #725 The only solution there was separate Cronicle instances, and no WebSocket connections spanning across a load balancer / proxy. I'm sorry if this doesn't help, but I don't know what else to try here. Things like this usually always come down to some incompatibility with WebSockets (and sometimes specifically socket.io, which Cronicle uses on top of WebSockets), and a piece of network security or proxy software or hardware in your environment, closing the sockets prematurely. I hope this helps. Best of luck to you. |
Yeah, I figured that was the part that was wrong here. This hostname/IP is not directly accessible via the primary node. It is connecting initially to a load balancer, which is forwarding the packets to the worker node behind it. The worker then responds with its internal IP that is inaccessible from the Primary, so the primary tries to connect to that instead, and fails. I guess what I was hoping to hear was that there was a way to have it actually use the FQ DNS that I supplied it in the initial Add Server dialog. I gave it a domain of (scrubbed example) xxdevvpc-lbxx.us-east-1.elb.amazonaws.com, and it turns around and tries to use the hostname (and related IP) instead. If it actually used the DNS provided, or even the base_url in the config.json, it wouldn't have a problem. If it doesn't exist, is that a feature that can be added relatively easily? I feel like I wouldn't be alone in wanting such a feature. |
Hi @jhuckaby, I wanted to follow up with you on this, and see if you had read my reply. I feel like the real problem is that the worker node replaces the initial DNS entry with either its IP or hostname, so the initial connection works because the DNS correctly routes it to the worker, but then the worker "corrects" the primary with the non-routable hostname/IP, and that's when it goes blank. I'd just like to see if there's a way to get a feature where the worker doesn't correct the primary's connection string, or at least makes the "correction" configurable. |
Try overriding the worker's hostname and IP by using these two top-level config properties in the worker's config.json file: "hostname": "corrected-hostname.mydomain.com",
"ip": "1.2.3.4" These are undocumented properties but they will override the "auto-detection" that happens on startup, where the system tries to figure out its own hostname and IP. |
Thanks @jhuckaby! This worked for my situation! If these are undocumented, it would seem worthwhile to add to documentation, as it would probably help a number of people who have nodes behind load balancers or NATs. |
Summary
Hi, I am evaluating Cronicle for use at my organization. In initial tests using a few throwaway VMs, this worked great, and has just about all the features I was looking for! The plan is to put this into our primary VPC as a primary node, and then run worker nodes inside of our various environments (which are also separate VPCs and aren't directly accessible from the primary node). However, in attempting to load it into our development environment, I am running into a problem where I cannot get the worker to connect to the primary node.
Steps to reproduce the problem
I tried multiple ways of accessing the worker from the primary, including the use of an ssh tunnel on a middle-man bastion server, before settling on a locked down port on the load balancer into that environment. In this configuration, adding the worker using the Add Server button in the server page allows it to connect, and will then almost immediately go grey. It appears that when adding in the load balancer address in the worker hostname box, it grabs the hostname from the worker server env, and then tries to use that to connect to, instead of the FQDN DNS address I gave it initially. Additionally, the worker's logs appear to suggest that it thinks the load balancer IP is the primary, and is trying to connect back to it, instead of reaching out to the actual primary.
Your Setup
Operating system and version?
Both primary and worker are running Ubuntu 22.04
Node.js version?
v21.7.0 & v20.11.1
Cronicle software version?
Version 0.9.44
Are you using a multi-server setup, or just a single server?
Single Primary, single worker (but want to expand to multiple workers)
Are you using the filesystem as back-end storage, or S3/Couchbase?
local filesystem
Can you reproduce the crash consistently?
It's not crashing per se, but the issue is consistent.
Log Excerpts
Primary
Cronicle.log:
Worker
Cronicle.log:
WebServer.log:
Any help you could provide would be greatly appreciated!
The text was updated successfully, but these errors were encountered: