Ability to configure readinessProbe and livenessProbe for vault agent injector #540

psibi · 2021-05-31T16:36:09Z

Is your feature request related to a problem? Please describe.

Currently running the generated manifest from helm through a tool like kube-score, results in the following error:

apps/v1/Deployment vault-agent-injector in my-namespace             💥
    [CRITICAL] Pod Probes
        · Container has the same readiness and liveness probe
            Using the same probe for liveness and readiness is very likely
            dangerous. Generally it's better to avoid the livenessProbe than
            re-using the readinessProbe.
            More information: https://github.com/zegl/kube-score/blob/master/README_PROBES.md

Given the vault-agent-injector is already running as PID 1, a better option for liveness check would be to rely on the default behaviour of k8s: Restart the container if the PID 1 has exited.

Describe the solution you'd like

Ability to configure the readiness and liveness probe. We could use a similar way of how things are supported for vault server:r:

  # Used to define custom readinessProbe settings
  readinessProbe:
    enabled: true
    # If you need to use a http path instead of the default exec
    path: /v1/sys/health?standbyok=true

    # When a probe fails, Kubernetes will try failureThreshold times before giving up
    failureThreshold: 2
    # Number of seconds after the container has started before probe initiates
    initialDelaySeconds: 5
    # How often (in seconds) to perform the probe
    periodSeconds: 5
    # Minimum consecutive successes for the probe to be considered successful after having failed
    successThreshold: 1
    # Number of seconds after which the probe times out.
    timeoutSeconds: 3
  # Used to enable a livenessProbe for the pods
  livenessProbe:
    enabled: false
    path: "/v1/sys/health?standbyok=true"
    # When a probe fails, Kubernetes will try failureThreshold times before giving up
    failureThreshold: 2
    # Number of seconds after the container has started before probe initiates
    initialDelaySeconds: 60
    # How often (in seconds) to perform the probe
    periodSeconds: 5
    # Minimum consecutive successes for the probe to be considered successful after having failed
    successThreshold: 1
    # Number of seconds after which the probe times out.
    timeoutSeconds: 3

Describe alternatives you've considered

There are various recommendations from kube-score itself regarding the alternative solutions for the liveness probes in general: https://github.com/zegl/kube-score/blob/master/README_PROBES.md#livenessprobe

Additional context

No additional context, but can provide if something is required. Also, liveness and readiness probe can be similar in some situations and there isn't a exact need to follow kube-score recommendations. But if that's the case, I would like to understand the reasoning behind it.

The text was updated successfully, but these errors were encountered:

Freyert · 2022-08-16T17:30:44Z

I would recommend not setting an HTTP based readiness/liveness check for vault agents. Currently I think this endpoint just forwards to the Vault Server. So if the Vault Server goes down the Vault Agents will crash.

The Vault Agents should be resilient to failures from the Vault Server so that operations continue despite Vault Server failures.

Since the Vault Agent doesn't serve HTTP traffic liveness/readiness checks aren't really important. A potentially more appropriate solution is to exit if template rendering fails too many times: vault.hashicorp.com/template-config-exit-on-retry-failure.

You should be able to identify failures with this, and if the problem goes unfixed the entire pod should fail eventually.

If you want to run the Vault Agent as a central HTTP Caching proxy (which is really quite useful, but outside the scope of the injector); the pod is really "ready" as soon as it comes online. Vault requests will just occasionally fail.

I think the Vault Agent HTTP Caching Proxy is potentially so simple that you just monitor/manage either end of the transaction and if either end has a problem you figure it out.

👉 Forgot about TCP readiness checks! HTTP Proxy it's good to just check if the port is open. Still not sure there's a great one for vault agent configured to render templates.

mgamsjager · 2023-12-13T09:50:07Z

How would you handle the case where the vault-container is stopped/crashed/removed and the secret is also removed from the application container?
A probe on the application side will only restart the application container but not the whole deployment.

psibi added the enhancement New feature or request label May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to configure readinessProbe and livenessProbe for vault agent injector #540

Ability to configure readinessProbe and livenessProbe for vault agent injector #540

psibi commented May 31, 2021

Freyert commented Aug 16, 2022 •

edited

Loading

mgamsjager commented Dec 13, 2023

Ability to configure readinessProbe and livenessProbe for vault agent injector #540

Ability to configure readinessProbe and livenessProbe for vault agent injector #540

Comments

psibi commented May 31, 2021

Freyert commented Aug 16, 2022 • edited Loading

mgamsjager commented Dec 13, 2023

Freyert commented Aug 16, 2022 •

edited

Loading