-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 151: Database Permission Management #33734
Conversation
|
||
Each target node will have its distinct set of relevant permissions, contingent on its type. For instance, tables may be associated with "SELECT" permissions, while procedures might involve "EXECUTE" permissions. These permissions will be represented as edges in the graph, each tagged with the appropriate permission type. | ||
|
||
The representation of allowed but unapplied permissions remains unclear within TAG. Expanding wildcards in permissions into individual nodes and edges is essential since TAG's representation is flat and explicit, lacking further interpretation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will actually hold because you can move the same design to the DATABASE
node type.
TAG current has the following nodes:
- User: Represents a user
- UserGroup: represents a group of users that should share the same modeled permissions (this might not hold true for expressions with traits like SSH logins...). Users can be member of UserGroups and UserGroups can be member of another UserGroup.
- Resource: Nodes, Kube, Database resources
- ResourceGroup: Similarly to UserGroups, ResourceGroups defines a set of resources that share the same properties, i.e. environment, location...
- Action: Action establishes a relationship between UserGroups and ResourceGroups. An Action can be SSH with SSH principal as root, allow session forwarding... Action left side must be a UserGroup and the rhs must be a ResourceGroup. The allow/deny is defined by the edge type.
A simple linear representation can be:
[User{name:"tiago"]-(member_of)->[UserGroup{name:"prod"}]-(allow)->[Action{name:..}] - (allow) -> [ResourceGroup{name:"prod"}] <- (member_of) - [Resource{name:"prod-1"}]
If you move your permissions to the action, you don't need to replicate the permissions per database and the number of permutations will reduce significantly.
You can also choose to represent the data inside the Database node instead of having external nodes so everything can be concise but I don't dislike to represent them as nodes.
We will have similar problems with Kubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move your permissions to the action, you don't need to replicate the permissions per database and the number of permutations will reduce significantly.
Makes sense; I think we can have (protocol type, permission type)
tuples, like (Postgres, SELECT)
. This does materially reduce the number of nodes in the graph.
You can also choose to represent the data inside the Database node instead of having external nodes so everything can be concise but I don't dislike to represent them as nodes.
This is an interesting option. Having a single Database node, instead of a number of nodes for each object in the database, would certainly be easier from the performance point of view. I think we would have to push the detailed permissions into the Action itself: (Postgres, SELECT, "public.sales")
or (Postgres, INSERT, "topsecret.*")
. This would need joining with the database schema to obtain an actual list of affected objects, but it would not be represented inside the graph itself.
I think this is a nice fallback scenario in case the "fully unfolded" option is unfeasible due to performance considerations. It would also work well with the policy-based permissions (row-level security), which are difficult to express in a graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrato The updated design is now based on labeled resources and label matching. I think this makes it more in line with the rest of resources, so hopefully easier to integrate with TAG, but can you please take another look at this?
4. Additional permission synchronizations can be triggered as necessary, such as on new connections, detected schema changes (which may be detected using database-specific methods), or based on a scheduled timer. | ||
5. For consistency, we ensure that new connections receive an equivalent set of permissions. Notably, permissions for other protocols or unsupported versions do not affect what is applied to the database. Changes in the database schema can lead to previously "identical" permission sets diverging, for example, when a new table is added. The permissions are equivalent if they are stable under schema changes. | ||
|
||
6. Once the last connection for the user is terminated, they are either removed or deactivated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A connection seems like a natural way to identify a database session, but in reality queries, subroutines, stored procedures, and likely other actions can continue to run after the user disconnects. We also need to consider things like MySQL Events which can run on a schedule: https://dev.mysql.com/doc/refman/8.0/en/events-overview.html
Have we considered how these potentially persistent conditions will be handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think this is relevant for the automated user provisioning feature too. In any case, once the last connection for a given user is dropped, all roles and permissions should be removed from the user. If possible, the user should be removed too.
This means that any lingering processes running as a user should no longer have the ability to meaningfully interact with the database - as long as we are thorough with the removal of permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is true. I believe in Postgres and MySQL the permissions are checked before execution starts. And that deleting the user mid-execution will not impact already started operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the permissions, but the database in question does not propagate those, then I'd say this is a problem (or a deliberate design) of the given database engine. I'm not sure what can we do more in this case, but I'm open for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest thing we need to consider is possible interactions when adding the user back. I am not aware of any reasons this will be a serious issue or of any security risks in this regard, but we should keep it in mind.
Besides that we need to document this concern. This and #33734 (comment) can likely be documented together: "You need to understand your database implementation, and how permissions are controlled and revoked" (in many more words)
5. For consistency, we ensure that new connections receive an equivalent set of permissions. Notably, permissions for other protocols or unsupported versions do not affect what is applied to the database. Changes in the database schema can lead to previously "identical" permission sets diverging, for example, when a new table is added. The permissions are equivalent if they are stable under schema changes. | ||
|
||
6. Once the last connection for the user is terminated, they are either removed or deactivated. | ||
7. When a user is removed, all permissions are automatically revoked. In cases where full removal is not possible (example: a user is the owner of database objects in Postgres), a process is in place to ensure that all permissions are removed from the user, regardless of their source. This approach acts as a precaution against potential lingering permissions and simplifies the management of permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah this came out rather vague. This is implementation-specific but with current implementations we trigger a stored procedure that drops privileges and roles. I'll rephrase this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we are not able to prevent tampering with this stored procedure. But can we do any detection to ensure it's running and validate it has not been tampered with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered any further how we can detect and report if the wiring for this feature has been tampered with or otherwise is no longer functioning correctly?
I wonder if it makes sense to add an audit event when the permission sync results in an error?
|
||
The introduction of the new feature has no direct impact on the security of Teleport itself. However, it does have implications for the security of connected databases within the supported configurations. For environments requiring heightened security, there may be value in explicitly excluding specific databases from this feature, potentially using a resource label for this purpose. | ||
|
||
It's important to recognize that with sufficiently broad permissions, a user might have the potential to elevate their database permissions further, including creating additional users and granting permissions. Given that this feature does not attempt to model the implications of individual database permissions, there is no foolproof mechanism to prevent such excessive permissions. Therefore, it falls to the system administrator to ensure that the granted permissions are kept to a minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure that when documenting this feature we highlight that administrators are taking this responsibility. And that they need understand their databases and the RBAC of their database systems carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I also wonder about utility vs. security here; we could refuse to implement the ability to grant certain permissions, to avoid potential footguns, but that might mean users who need those are left out. We might also gate this behind a resource label or other mechanism (to avoid stumbling into this without knowing), but that would necessarily complicate the UX as well as the implementation itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make sense to prevent user creation. I am not sure if there would be a legitimate need for that considering teleport
should be doing that management.
I have not thought about this deeply yet, are there other escalation or persistence paths you have considered?
That said, I still think regardless of what limits we put in, documenting this responsibility is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature has a non-trivial consequences for security. Being able to disable it at the resource level (e.g., for all production databases) might be useful: there is no way to misconfigure a disabled feature. I suppose it is somewhat similar in purpose to permission boundaries https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explore the label based design like I mentioned in my comment.
b123d94
to
6fc078f
Compare
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
The redesign is complete. PTAL @r0mant @smallinsky @jentfoo @tigrato POC: #35034 |
Additionally, the imports will be done on a predetermined schedule (e.g. every | ||
10 minutes), and stored in the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you think about applying change on insert? Otherwise people will be still able to access the DB using the old permissions for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema change is mentioned as a potential trigger, but this needs to be detected somehow. No general way to do it, so a cron job is a sensible baseline to have. But yeah, this is definitely an extension we want to support at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Postgres schema change, probably can use PostgreSQL trigger+notify:
https://medium.com/launchpad-lab/postgres-triggers-with-listen-notify-565b44ccd782
But agreed there is no general way to do it for all protocols and object kinds.
Assuming the periodic scan happens on Database Service (as opposed to discovery service or another service), how to coordinate the scan in a HA setup (multiple db service serving the same database)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Postgres schema change, probably can use PostgreSQL trigger+notify:
https://medium.com/launchpad-lab/postgres-triggers-with-listen-notify-565b44ccd782
Yup, something like this should work for Postgres assuming permissions align etc. In case of errors periodic scan would provide a viable fallback. The actual notify implementation may be unreliable, too.
Assuming the periodic scan happens on Database Service (as opposed to discovery service or another service), how to coordinate the scan in a HA setup (multiple db service serving the same database)?
As a baseline, a random jitter would be added to the timer, and objects would be discovered multiple times. This should look just like regular discovery, except one running with increased frequency.
This hints at one possible solution to increased backend load: provide a tuneable for discovery frequency, and decrease it automatically as a function of the number of servers performing the discovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure these details are clearly described in the docs. I believe that our customers will be shouldering a lot with understanding their specific database implementation, the permission model within it, and the schema design and update process.
Additionally, the imports will be done on a predetermined schedule (e.g. every | ||
10 minutes), and stored in the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Postgres schema change, probably can use PostgreSQL trigger+notify:
https://medium.com/launchpad-lab/postgres-triggers-with-listen-notify-565b44ccd782
But agreed there is no general way to do it for all protocols and object kinds.
Assuming the periodic scan happens on Database Service (as opposed to discovery service or another service), how to coordinate the scan in a HA setup (multiple db service serving the same database)?
Additionally, the imports will be done on a predetermined schedule (e.g. every | ||
10 minutes), and stored in the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure these details are clearly described in the docs. I believe that our customers will be shouldering a lot with understanding their specific database implementation, the permission model within it, and the schema design and update process.
5. For consistency, we ensure that new connections receive an equivalent set of permissions. Notably, permissions for other protocols or unsupported versions do not affect what is applied to the database. Changes in the database schema can lead to previously "identical" permission sets diverging, for example, when a new table is added. The permissions are equivalent if they are stable under schema changes. | ||
|
||
6. Once the last connection for the user is terminated, they are either removed or deactivated. | ||
7. When a user is removed, all permissions are automatically revoked. In cases where full removal is not possible (example: a user is the owner of database objects in Postgres), a process is in place to ensure that all permissions are removed from the user, regardless of their source. This approach acts as a precaution against potential lingering permissions and simplifies the management of permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered any further how we can detect and report if the wiring for this feature has been tampered with or otherwise is no longer functioning correctly?
I wonder if it makes sense to add an audit event when the permission sync results in an error?
Example database object, with applied labels: | ||
|
||
```yaml | ||
kind: db_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error in the schema definition, how will the administrator know? For allow functions the failure mode is obvious, but a typo in a deny action could result in an expected control not functioning.
Can we validate this schema at the point of configuration? There is still a concern around database changes happening after configuration, but it would at least prevent typo like failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what can we do except to provide some facilities to debug the algorithm, visualising the intermediates etc. This is not a new problem: the existing deny rules that depend on matching have no guardrails either. There is also no existing functionality to help debug these or warn in case of mismatches.
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
mappings: | ||
- match: | ||
databases: | ||
- 'Widget*' | ||
db_service_names: | ||
- all-things-widget | ||
names: | ||
- '*sales*' | ||
object_kinds: | ||
- table | ||
- view | ||
- procedure | ||
protocol_names: | ||
- postgres | ||
schemas: | ||
- widget | ||
- sales | ||
- public | ||
- secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we could further simplify this
Similar to db objects, db_service_names
and protocol_names
are "attributes" of the database service. db_labels
should be enough to match the database service without needing the matching on these attributes.
This is a little repetitive to me:
names:
- '*sales*'
object_kinds:
- table
- view
- procedure
What you think about something like this instead
# search in databases/schemas, default '*'
databases:
- 'Widget*'
schemas:
- public
# import tables/views/procedures, default null
tables:
- '*sales*'
views:
- '*'
prodecures:
Though it's kind of blurry without the comments what are search scopes vs what are getting imported. Maybe in_databases/in_schemas
to differentiate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so basically combine (object kind, name) matchers by defining different branches?
It functionally renders object kind
an enum, but that isn't necessarily a bad thing.
I agree this is more readable in general.
Regarding the search scope vs import - maybe something like this?
scope: # search in databases/schemas, default '*'
databases:
- 'Widget*'
schemas:
- public
objects: # import tables/views/procedures, default null
tables:
- '*sales*'
views:
- '*'
procedures:
- '*'
(I'm not sure if objects
or import
is better here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If splitting scopes from this list, it might be easier to move them outside mappings
?
kind: db_object_import_rule
version: v1
metadata:
name: rule_widget_prod
spec:
priority: 10
db_labels:
env: prod
database_names: ['Widget']
mappings:
- match:
table_names:
- 'sales'
add_labels:
env: prod
dept: sales
(as db_labels
is kind of a scope 🤔?) Will database
and schema
be imported as db_objects in the future?
As others pointed out, if to be consistent with okta rules, may want to use like table_names
and table_name_regexes
. Any concern with performance using regexes vs simple glob?
Another point from okta rule is each match only allows one import/object kind:
Lines 180 to 184 in 4f77140
// CheckAndSetDefaults checks and sets default values | |
func (o *OktaImportRuleMatchV1) CheckAndSetDefaults() error { | |
if len(o.AppIDs) > 0 && len(o.GroupIDs) > 0 { | |
return trace.BadParameter("only one of App IDs or Group IDs can be set") | |
} |
The tradeoff is the user may need many
match
blocks but it's clear what each match
is importing. Might be easier for the implementation too. What do you think? Personally I am fine with either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If splitting scopes from this list, it might be easier to move them outside mappings?
I've experimented with this a bit, and I feel it works better if you can have a set of different scopes
per each import rule mapping. I'm not entirely sure what the best way to model this is, so I'll keep looking into this.
(as db_labels is kind of a scope 🤔?) Will database and schema be imported as db_objects in the future?
Both are possible, since these are entities recognised by db engines that we can grant permissions to.
As others pointed out, if to be consistent with okta rules, may want to use like table_names and table_name_regexes. Any concern with performance using regexes vs simple glob?
Good point regarding naming, this will make it more consistent. No performance penalty: we compile globs to regexes anyway.
Another point from okta rule is each match only allows one import/object kind:
I'm not sure why this is a restriction for Okta rules, but I think it makes sense to allow multiple entities and kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tener Thank you for putting up with all the naming/structure nitpicks and keeping to make these changes :) I left a couple more comments but from my standpoint we're almost there.
names: | ||
- '*sales*' | ||
object_kinds: | ||
- table | ||
- view | ||
- procedure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider doing table_names
, view_names
, etc. which seems will be a little bit easier to "parse" visually and comprehend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think something along these lines #33734 (comment) ?
db_service_names: | ||
- all-things-widget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this matcher for? Wouldn't database service that uses this import rule be selected based on "db_labels" above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labels may match multiple services; this allows you to match by name. I do see your point that this is somewhat redundant, so we can simplify the design by dropping this capability.
Any particular attribute can be omitted from the import rule; an import rule | ||
with empty `match` part will match all objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably err on the side of caution and require an explicit wildcard to match all objects instead of implicit "empty means all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a natural way to require at least one non-empty match clause as it is right now, but I think it may change in light of @greedy52 comment #33734 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think empty-means-all is ok for scope
. For match
, I still think it should be explicit.
Imagine the scenario where we add support to a new db object type (say sequence_names
). If empty-means-all, the auto-provisioned user may automatically get permissions to SEQUENCE, after the upgrade to the new version. Then the customer has to manually go edit each import rule to add an empty to sequence_names
to preserve the same behavior as before.
--- Update
After reading the role part again, I don't think the user will automatically granted this new kind
. But the above scenario is still true where the backends will get new objects automatically. Also, for an example, the customer has to update import_all_staging_tables
below to make sure all other object types are disabled.
Co-authored-by: Roman Tkachenko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 🚀 🚀 🚀 🚀
Overall looks good!
Just one comment on that empty-match-matches-all in the import rule. Personally I would still prefer to be explicit. But I recall you mention some usage/user experience when recording the demo. Let me know you think. Thanks!
spec: | ||
allow: | ||
db_permissions: | ||
- match: | ||
confidential: 'false' | ||
kind: | ||
- table | ||
- view | ||
permissions: | ||
- SELECT | ||
- INSERT | ||
- UPDATE | ||
deny: | ||
db_permissions: | ||
- match: | ||
confidential: 'true' | ||
permissions: | ||
- '*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which of these fields support wildcard and which don't? Is it only the permissions
in deny
supports wildcard? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, only deny
allows you to use wildcard, explicitly for the purpose of denying all possible permissions.
Empty list of names should match no objects, as this is more intuitive and future-proof. Empty scope should match anything, as the scope may be redundant with other attributes which are mandatory.
As discussed offline, I agree having the names be explicit is better UX (less surprising) as well as more future proof (new object types will be excluded by default, which is less surprising, and thus better UX as well). I did leave "empty matches all" for the scope part (db and schema names), as I feel this works better, but we can always revisit this detail down the line. I'll merge the RFD as is for now. 🎉 |
This is an initial proposal for implementing #32627.
PR with the proof of concept: #35034
Rendered.