You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered an interesting (kinda edge) case while testing Charmed Aether SD-Core. If the POD sandbox changes, Traefik restarts without fetching the TLS certificates from the relation data. As a result, SD-Core's GUI becomes unavailable and we're getting an internal server error in Traefik (more details below).
From the Treafik charm's code I see that the certs are pushed to the workload container in 2 cases:
Certs updated in the relation data
Config changed (only if stored state hash changes)
The problem is that none of these happens when the sandbox changes (Juju logs attached below).
There can be different reasons for the POD sandbox to go to down. In my case it was suspending my laptop for the night (locally-running Microk8s).
After restarting Treafik Pod, everything comes back to normal, because in this case config changed gets fired.
Cheers,
Bartek
To Reproduce
As mentioned in the description, change of a POD sandbox can happen due to a bunch of reasons (i.e. insufficient resources), but the easiest way to reproduce the problem is this:
Symptom from the Traefik's Pod `describe`:`Normal SandboxChanged 100s kubelet Pod sandbox changed, it will be killed and re-created.`Effect of the issue when trying to access SD-Core NMS web page:2025-01-13T09:17:28.445Z [traefik] time="2025-01-13T09:17:28Z" level=debug msg="'500 Internal Server Error' caused by: tls: failed to verify certificate: x509: certificate signed by unknown authority"
Juju logs for Traefik starting after POD sandbox change, showing that config changed is not happening:
unit-traefik-0: 08:51:33 INFO juju.cmd running containerAgent [3.6.1 cdb5fe45b78a4701a8bc8369c5a50432358afbd3 gc go1.23.4]
unit-traefik-0: 08:51:33 INFO juju.cmd.containeragent.unit start "unit"
unit-traefik-0: 08:51:33 INFO juju.worker.upgradesteps upgrade steps for 3.6.1 have already been run.
unit-traefik-0: 08:51:33 INFO juju.worker.probehttpserver starting http server on 127.0.0.1:65301
unit-traefik-0: 08:51:33 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [1353c1] "unit-traefik-0" cannot open api: unable to connect to API: dial tcp 10.152.183.149:17070: connect: connection refused
unit-traefik-0: 08:51:38 ERROR juju.worker.dependency "api-caller" manifold worker returned unexpected error: [1353c1] "unit-traefik-0" cannot open api: unable to connect to API: dial tcp 10.152.183.149:17070: connect: connection refused
unit-traefik-0: 08:51:43 INFO juju.api cannot resolve "controller-service.controller-microk8s-localhost.svc.cluster.local": lookup controller-service.controller-microk8s-localhost.svc.cluster.local: operation was canceled
unit-traefik-0: 08:51:43 INFO juju.api connection established to "wss://10.152.183.149:17070/model/1353c1e2-6fb4-4669-8f77-3712b9b64faa/api"
unit-traefik-0: 08:51:43 INFO juju.worker.apicaller [1353c1] "unit-traefik-0" successfully connected to "10.152.183.149:17070"
unit-traefik-0: 08:51:43 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-traefik-0: 08:51:43 INFO juju.worker.logger logger worker started
unit-traefik-0: 08:51:43 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-traefik-0: 08:51:43 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-traefik-0
unit-traefik-0: 08:51:43 INFO juju.worker.leadership traefik/0 promoted to leadership of traefik
unit-traefik-0: 08:51:43 INFO juju.worker.caasupgrader abort check blocked until version event received
unit-traefik-0: 08:51:43 INFO juju.worker.caasupgrader unblocking abort check
unit-traefik-0: 08:51:43 INFO juju.worker.uniter unit "traefik/0" started
unit-traefik-0: 08:51:43 INFO juju.worker.uniter hooks are retried true
unit-traefik-0: 08:51:43 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm
unit-traefik-0: 08:51:44 INFO unit.traefik/0.juju-log Running legacy hooks/start.
(Removed warnings about deprecation of calling ops.main.main())
unit-traefik-0: 08:51:47 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)
unit-traefik-0: 08:51:49 INFO unit.traefik/0.juju-log Kubernetes service 'traefik' patched successfully
(Removed warnings about deprecation of calling ops.main.main())
unit-traefik-0: 08:51:54 INFO juju.worker.uniter.operation ran "traefik-pebble-ready" hook (via hook dispatching script: dispatch)
### Additional context
_No response_
The text was updated successfully, but these errors were encountered:
Bug Description
Hi Team,
I've encountered an interesting (kinda edge) case while testing Charmed Aether SD-Core. If the POD sandbox changes, Traefik restarts without fetching the TLS certificates from the relation data. As a result, SD-Core's GUI becomes unavailable and we're getting an internal server error in Traefik (more details below).
From the Treafik charm's code I see that the certs are pushed to the workload container in 2 cases:
The problem is that none of these happens when the sandbox changes (Juju logs attached below).
There can be different reasons for the POD sandbox to go to down. In my case it was suspending my laptop for the night (locally-running Microk8s).
After restarting Treafik Pod, everything comes back to normal, because in this case
config changed
gets fired.Cheers,
Bartek
To Reproduce
As mentioned in the description, change of a POD sandbox can happen due to a bunch of reasons (i.e. insufficient resources), but the easiest way to reproduce the problem is this:
traefik
container see that there's no certs under/usr/local/share/ca-certificates
Environment
Required tools and versions described in the Charmed Aether SD-Core's Getting started tutorial
Relevant log output
Juju logs for Traefik starting after POD sandbox change, showing that
config changed
is not happening:The text was updated successfully, but these errors were encountered: