Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to configure readinessProbe and livenessProbe for vault agent injector #540

Open
psibi opened this issue May 31, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@psibi
Copy link

psibi commented May 31, 2021

Is your feature request related to a problem? Please describe.

Currently running the generated manifest from helm through a tool like kube-score, results in the following error:

apps/v1/Deployment vault-agent-injector in my-namespace             💥
    [CRITICAL] Pod Probes
        · Container has the same readiness and liveness probe
            Using the same probe for liveness and readiness is very likely
            dangerous. Generally it's better to avoid the livenessProbe than
            re-using the readinessProbe.
            More information: https://github.com/zegl/kube-score/blob/master/README_PROBES.md

Given the vault-agent-injector is already running as PID 1, a better option for liveness check would be to rely on the default behaviour of k8s: Restart the container if the PID 1 has exited.

Describe the solution you'd like

Ability to configure the readiness and liveness probe. We could use a similar way of how things are supported for vault server:r:

  # Used to define custom readinessProbe settings
  readinessProbe:
    enabled: true
    # If you need to use a http path instead of the default exec
    path: /v1/sys/health?standbyok=true

    # When a probe fails, Kubernetes will try failureThreshold times before giving up
    failureThreshold: 2
    # Number of seconds after the container has started before probe initiates
    initialDelaySeconds: 5
    # How often (in seconds) to perform the probe
    periodSeconds: 5
    # Minimum consecutive successes for the probe to be considered successful after having failed
    successThreshold: 1
    # Number of seconds after which the probe times out.
    timeoutSeconds: 3
  # Used to enable a livenessProbe for the pods
  livenessProbe:
    enabled: false
    path: "/v1/sys/health?standbyok=true"
    # When a probe fails, Kubernetes will try failureThreshold times before giving up
    failureThreshold: 2
    # Number of seconds after the container has started before probe initiates
    initialDelaySeconds: 60
    # How often (in seconds) to perform the probe
    periodSeconds: 5
    # Minimum consecutive successes for the probe to be considered successful after having failed
    successThreshold: 1
    # Number of seconds after which the probe times out.
    timeoutSeconds: 3

Describe alternatives you've considered

There are various recommendations from kube-score itself regarding the alternative solutions for the liveness probes in general: https://github.com/zegl/kube-score/blob/master/README_PROBES.md#livenessprobe

Additional context

No additional context, but can provide if something is required. Also, liveness and readiness probe can be similar in some situations and there isn't a exact need to follow kube-score recommendations. But if that's the case, I would like to understand the reasoning behind it.

@psibi psibi added the enhancement New feature or request label May 31, 2021
@Freyert
Copy link

Freyert commented Aug 16, 2022

I would recommend not setting an HTTP based readiness/liveness check for vault agents. Currently I think this endpoint just forwards to the Vault Server. So if the Vault Server goes down the Vault Agents will crash.

The Vault Agents should be resilient to failures from the Vault Server so that operations continue despite Vault Server failures.

Since the Vault Agent doesn't serve HTTP traffic liveness/readiness checks aren't really important. A potentially more appropriate solution is to exit if template rendering fails too many times: vault.hashicorp.com/template-config-exit-on-retry-failure.

You should be able to identify failures with this, and if the problem goes unfixed the entire pod should fail eventually.


If you want to run the Vault Agent as a central HTTP Caching proxy (which is really quite useful, but outside the scope of the injector); the pod is really "ready" as soon as it comes online. Vault requests will just occasionally fail.

I think the Vault Agent HTTP Caching Proxy is potentially so simple that you just monitor/manage either end of the transaction and if either end has a problem you figure it out.


👉 Forgot about TCP readiness checks! HTTP Proxy it's good to just check if the port is open. Still not sure there's a great one for vault agent configured to render templates.

@mgamsjager
Copy link

How would you handle the case where the vault-container is stopped/crashed/removed and the secret is also removed from the application container?
A probe on the application side will only restart the application container but not the whole deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants