You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a sign that your worker server is overloaded, and it cannot maintain the websocket connection to the Cronicle master server.
It's possible that the Cronicle process was killed by the kernel (OOM) to free up memory. I don't know anything about "Rocky" linux, but I would look for the kernel OOM logs to see if it is killing processes.
I'd also recommend monitoring CPU, memory and network connections while your job is running. I made a free app called Performa which does this, but there are many others that do it too.
To avoid the cron execution from getting impacted, if my understanding is correct, "retries" option would re-run the cron for all kind of failures but is there any way to re-run a cron automatically for server related issues like this?
Summary
Jobs failing with "Aborted Job: Server 'worker1' shut down unexpectedly".
Steps to reproduce the problem
Run a heavy network IO job(transferring large file or spawning 10000 SSH connections to remote hosts) and parallelly schedule a cron.
Your Setup
Operating system and version?
Rocky Linux release 8.9
Node.js version?
v16.20.2
Cronicle software version?
Version 0.9.25
Are you using a multi-server setup, or just a single server?
Single Primary with multiple-workers
Are you using the filesystem as back-end storage, or S3/Couchbase?
Local FS
Can you reproduce the crash consistently?
No
Log Excerpts
Sharing some failure events.
The text was updated successfully, but these errors were encountered: