Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix discovery overwriting dynamic resources #44316

Merged

Conversation

GavinFrazar
Copy link
Contributor

Changelog: Prevent discovery service from overwriting Teleport dynamic resources that have the same name as discovered resources.

Closes #29828

This fix relies on checking the existing resource to see if we can update it - when we add optimistic locking for db/kube_cluster/app then we can do a conditional update. It's very unlikely that a user will update the resource during such a small time window anyway. I also consider using conditional updates out of scope to fix the linked issue because it's a fundamentally different problem.

@rosstimothy
Copy link
Contributor

rosstimothy commented Jul 17, 2024

when we add optimistic locking for db/kube_cluster/app then we can do a conditional update. It's very unlikely that a user will update the resource during such a small time window anyway. I also consider using conditional updates out of scope to fix the linked issue because it's a fundamentally different problem.

There are no plans to extend optimistic locking to presence related resources. Conditional writes are a more expensive operation than a plain write and at scale can(and have) taken down a backend. Optimistic locking is only intended to be leveraged for cluster configuration(roles, users, auth preference, auth connectors, etc.) that may accidentally be simultaneously updated by humans.

Copy link
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we opted not to proceed with your suggestion from #29828? I think the Create, Get, Update sequence introduced here could pose other issues since none of the operations are atomic and we may be operating on outdated state by the time the Get on creation failure occurs.

@rosstimothy rosstimothy self-requested a review July 17, 2024 12:38
@GavinFrazar GavinFrazar requested a review from greedy52 July 17, 2024 17:23
@GavinFrazar
Copy link
Contributor Author

GavinFrazar commented Jul 17, 2024

How come we opted not to proceed with your suggestion from #29828? I think the Create, Get, Update sequence introduced here could pose other issues since none of the operations are atomic and we may be operating on outdated state by the time the Get on creation failure occurs.

I opted to not do that because the reconciler would have to be changed to avoid deleting current resources in a different discovery_group or of a different origin - it seemed to me like that behavior was too specific to put in reconciler code and it would affect a ton of things that use the reconciler.

If I did do that then we would still be operating on potentially outdated state, right?
Reconciler could call GetCurrentResources(), then one of those resources is updated before it takes some action like create/delete/update.

edit:
Another motivating reason that I forgot to include before, is that if I change the reconciler config such that it gets unfiltered resources, then it could still have a resource created during the time window between fetching current resources and creating a new resource. If that happens, then it needs to have special handling in "OnCreate" to handle AlreadyExists, and then we're right back to this solution but with a bunch of extra code to handle the unfiltered resources and avoid deleting resources that aren't managed by discovery.

When the resource already exists, check its origin as well as
discovery group.
If it's not of discovery origin, then don't update it.
If it's not in the same discovery group, and its discovery group is not
blank, then don't update it.
@GavinFrazar GavinFrazar force-pushed the gavinfrazar/fix-discovery-service-overwriting-resources branch from f1dd572 to 1190fed Compare July 25, 2024 17:30
@GavinFrazar GavinFrazar requested a review from rosstimothy July 26, 2024 20:17
@GavinFrazar GavinFrazar added this pull request to the merge queue Jul 29, 2024
Merged via the queue into master with commit d0a2a0d Jul 29, 2024
37 checks passed
@GavinFrazar GavinFrazar deleted the gavinfrazar/fix-discovery-service-overwriting-resources branch July 29, 2024 22:27
@public-teleport-github-review-bot

@GavinFrazar See the table below for backport results.

Branch Result
branch/v14 Failed
branch/v15 Create PR
branch/v16 Create PR

@milos-teleport
Copy link
Contributor

milos-teleport commented Oct 16, 2024

Example error if someone stumbles upon this with v14 or earlier

2024-01-01T15:20:00Z WARN [DISCOVERY] Unable to reconcile database resources. error:[
ERROR REPORT:
Original Error: trace.aggregate failed to create db dbname
database "dbname" doesn't exist

Solution: upgrade to at least v15.4.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discovery Service overwrites dynamic resources of same name
4 participants