Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSO Controller Stopped Renewing Vault Auth Token Unexpectedly #858

Open
kdw174 opened this issue Jul 17, 2024 · 1 comment
Open

VSO Controller Stopped Renewing Vault Auth Token Unexpectedly #858

kdw174 opened this issue Jul 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@kdw174
Copy link

kdw174 commented Jul 17, 2024

Describe the bug
Our GCP dynamic credentials were deleted prematurely.

The vault secrets operator controller leader did not renew the vault token used to read gcp dynamic secrets. When the vault token expired, vault revoked the leases for the gcp dynamic credentials and deleted the credentials in gcp. Neither the gcp secret engine lease nor the kubernetes auth vault token should've been near their max ttl.

We run 2 controller pods with direct-encrypted persistent cache and leader election configured.

      --leader-elect
      --client-cache-persistence-model=direct-encrypted
      --client-cache-size=10000

Let me preface this by acknowledging that the long lived ttls are not a recommended approach. We plan to move to shorter ttls.

  • vault kubernetes auth configured with a 1 year max ttl
  • vault kubernetes auth role configured with an initial 3600s ttl and no max ttl
  • gcp secret engine configured with a default lease ttl of 1 year and a max ttl of 1 year
  • vaultdynamicsecret configured to renew at 51% of ttl

We confirmed this with vault audit logs. We were able to capture the hashed token showing the when the token was used to read the gcp dynamic secret for the first time and the last time the token was renewed. 1 hour after the last renewal, the vault gcp engine leases created from the kubernetes auth vault token were revoked.

We run the same configuration in a second cluster plus leverage gcp dynamic credentials with the same configuration elsewhere and have not seen this happen anywhere else. As expected, we see the original vault token used to generate the same gcp dynamic secrets in another cluster being renewed to this day.

The vault token and gcp credentials were both 17 days old when the token did not get renewed and the gcp leases were revoked.

Additional context

  • We leverage the gcp static-account setup with service account keys
  • At the time, we were at the 10 key limit and were trying to generate an additional 2 keys in this cluster. VSO was throwing errors around not being able to generate new keys because it was at the limit, but had been doing so for greater than 10 days and didn't seem to impact anything with existing keys.
  • The non-leader VSO pod rolled during the renewal window for the impacted vault token

To Reproduce
Steps to reproduce the behavior:

  1. Still trying to reproduce on my end. I will update if/when I'm able to reproduce the issue. Please let us know if there's additional logs, metrics or configuration we should dig into or other things to consider to try to determine what happened here.

Application deployment:

Expected behavior
Vault secrets operator controller continues to renew the vault token and does not let the dynamic credentials lease expire before their ttl

Environment

  • Kubernetes version: 1.25
  • vault-secrets-operator version: 0.5.1
@kdw174 kdw174 added the bug Something isn't working label Jul 17, 2024
@benashz
Copy link
Collaborator

benashz commented Jul 17, 2024

Hi @kdw174 - sorry to hear you encountered some issues with VSO.

We have made a lot improvements to the way the Vault tokens are handled with dynamic secrets, including support for tokens with max TTLs. The bulk of those fixes were in v0.6.0. Would it be possible for you to upgrade to the latest release which is currently: v0.7.1 ?

Thanks,

Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants