Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improving Dls/Fls with Views #13108

Open
stephen-crawford opened this issue Apr 6, 2024 · 4 comments
Open

[RFC] Improving Dls/Fls with Views #13108

stephen-crawford opened this issue Apr 6, 2024 · 4 comments
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Security Project-wide roadmap label Search Search query, autocomplete ...etc

Comments

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Apr 6, 2024

Preempting questions

  • What are views

#6181

  • What is DLS/FLS

https://opensearch.org/docs/latest/security/access-control/document-level-security/

Basically, DLS let you filter out documents while FLS lets you filter out fields from docs.

  • Why is this RFC in core?

While the Security plugin is home to DLS/FLS, Views lives in OpenSearch core and further development would likely involve a combination of changes in both repos. Further, I want to get lots of people's input and core sees the most foot-traffic.

  • Why base on Views?

I think it is easiest to make a change when you have some existing framework in place. In this case, we already have Views which was initially discussed as a potential replacement for Dls/Fls amongst other things.

  • Why now?

The Amazon-affiliated maintainers of the Security plugin are currently looking for some new directions to drive our development. This seems like a prime opportunity to address some of main areas we hear user complaints about. If we are looking to establish potential new directions I think this could be a really good one.

  • What are you looking for with this RFC/Why don't you just do it?

Before investing a sizeable chunk of time and effort into making this shift in model, I want to gauge whether the community as a whole would be interested in something like that and what types of other considerations they may want addressed. While there are some issues that suggest the current DLS/FLS system is lacking, maybe this is selection bias and it is not as big of an issue as I think. Likewise, I want to provide the opportunity for everyone to voice their opinion on using Views as the replacement.

Background

This issue is meant to be a request for comments around a shift in how OpenSearch implements DLS/FLS.

Recently, @peternied created #11957 which abstracts indices from searches. I cannot do as well of a job explaining all the intricacies of Views as Peter himself, so I encourage everyone to visit #6181 to learn more.

This RFC is then meant to be the next step towards addressing one of the common pain-points OpenSearch Security users share: DLS/FLS can be hard to use, limited in its application, and non-intuitive in advanced cases.

As noted on the original RFC for future work around Views (#12424), the Security plugin frequently gets new issues focused around extending DLS/FLS nested fields.

That being said, I propose OpenSearch (and the Security plugin) pivot to remodel the DLS/FLS system to use views as opposed to the current system.

What is the current system for DLS/FLS

Currently, DLS/FLS takes place inside the Security plugin using a dual-pronged approach.

DLS

There are two forms of DLS which occur depending on the type of request being made.

  1. The first type of DLS is request interception DLS. This DLS takes place when the Security plugin intercepts search requests and adds a DLS query to it. This process occurs on the OpenSearch-level as opposed to the Lucene-level (more about that later).

The main place interception DLS takes place is inside the Security plugin with the DlsFlsRequestValve interface and its respective DlsFlsValveImpl. In this case, the onIndexModule method inside the OpenSearchSecurityPlugin modifies the query sent by the user. The apply method of the SecurityFilter class then verifies the action is allowed (there are a couple actions such as Updates which are not allowed while using DLS/FLS).

  1. The second form of DLS is deletion DLS. This type of DLS is used when handling get/mget requests and does not allowing caching unlike interception DLS. Deletion DLS operates on the Lucene-level by making an index appear to be deleted and empty even when it is not. This approach ties-into the way that Lucene handles document deletion. Notably, if a document is deleted in Lucene -- it is not deleted immediately; it is just marked as deleted. These marked docs are then truly deleted during segment merges. However, in order to provide on-the-fly document deletion and field anonymization/filtering, OpenSearch wraps the Lucene class responsible for performing search (it is called the DirectoryReader). OpenSearch handles the wrapping of the DirectoryReader inside core's IndexModule.setReaderWrapper. The Security plugin then takes this and uses the onIndexModule to set the reader wrapper for each index to a new instance of the SecurityFlsDlsIndexSearcherWrapper which is able to return a new DlsFlsFilterLeafReader as part of the dlsFlsWrap method.

Finally, using the instances of the DlsFlsFilterLeafReader, OpenSearch is able to modify the behavior of Lucene and customize its segment merge policy. This lets OpenSearch mark documents as deleted on the fly. An example of how this is used is the Configuration index which is only accessible via the Admin TLS certificate. Anytime a non-admin user tries to access the index, the wrapped DirectoryReader uses the EmptyFilterLeafReader and the SecurityIndexSearcherWrapper.apply method to mark all docs in the index as deleted. This same process happens for all other protected or system indices. Behind the scenes, the SecurityIndexSearcherWrapper calls on the EmptyFilterLeafReader to return it a bit set showing all docs as deleted.

FLS

FLS is performed in a similar manner to the on-the-fly deletion of sensistive documents. FLS is based on the FieldFilterLeafReader class in Lucene. Using the DlsFlsFilterLeafReader in the Security plugin, when you execute a query, the LeafReader wrapper modifies the underlying Lucene bit set to hold only field info for the allowed fields. Within the category of FLS there are two additional considerations: source field handling and anonymization/field masking.

Source Field Handling

When using FLS we are not only dealing with the stored fields but also the source fields (an OpenSearch concept). Because you are able to include source fields with your search results, we need to account for these fields when performing FLS. The FlsStoredFieldVisitor removes filtered fields from the source fields if they are included in the search result.

Anonymization

Anonymization/field masking is based on the StoredFieldVisitor which utilizes the Visitor pattern (a Lucene construct for looking at stored fields) and adds hashing to the target fields. The security plugin handles this in the HashingStoredFieldVisitor class. There the overridden stringField method and the binaryField method allow for the anonymization of any target fields.

Reworking with Views

As discussed there are two types of DLS (interception and deletion) as well as FLS which need to be replaced for Views to cover the existing DLS/FLS functionality. There are a few different patterns which look promising.

The options for these patterns are:

Perform DLS by returning a View with only the allowed docs

For this operation flow, DLS would be peformed by whole-scale removing the existing DLS logic and dropping a customized ViewService into the Security plugin. In this scenario, the DlsFlsValveImpl would be swapped out for a siimilar class (DlsFlsViewImpl?) which could make use of Views for resolving Dls requests instead of EvaluatedDlsFlsConfigs. Rather than reading the EvaluatedDlsFlsConfigs, a request could be associated with a View (resolved from the roles similar to Dls/Fls but with a structured object) and then the specified indices could be returned before the search request is performed against only that subset. The View framework for this already exists with the SearchViewAction and GetViewAction.

This option has an added benefit over the existing implementation since it would not be necessary to use the on-the-fly-deletion form of DLS anymore. Because the search request would be restricted internally to the View(s) associated with the user's role there should not be the need to dynamically filter out protected or system indices.

Fls in this scenario would likely need to be handled as part of the view preprocessing or in a similar way to how it is handled now

Pros: Removes the second type of DLS processing, makes results more predictable since Views could be given a graphical component for visualization etc.

Cons: Pretty complicated changes, different user experience, not backwards compatible at all

Perform DLS by replacing the internal filtering system with reroutes to Views

For this option, the idea would be to keep the general Dls/Fls system the same but replace the internal components of the DlsFlsValveImpl with calls to a ViewService implementation which would be added to the Security plugin. Instead of making use of the current implementation, the Security plugin could handle conversion of Dls/Fls specs into Views. This would let a Dls/Fls configuration be behind-the-scenes converted into a View which would then be searched similar to the previous option. By converting the configurations into Views, operations like comparisons and combinations could be done in a memory efficient way. The results could be cached and memoized.

The problem with this option is that it does not directly address many of the common complaints with Dls/Fls. It is not really any more simple since all the conversion would be behind the scenes. Further, it does not remove the need for the dual form of Dls since it does not direclty address the chance that a search request would show protected indices.

Pros: Easier implementation, stored objects may boost repeated Dls/Fls performance

Cons: Does not directly address common concerns, still requires dual Dls system

Expand DLS by offering View support

A third option to consider is an effort to simply offer DLS through Views without directly replacing the entire system. For this approach, we could avoid a complete overhaul of the DLS system and simply connect the new concept of Views to the existing DLS framework. In this case, we would likely want to add an additional class to the Security plugin which again extended the ViewService from core but was called as part of the DLS process. Leaving FLS alone for now, this approach could see improvements to the usability of DLS by offering similar support to the first option (above) but not replacing all the existing code. Inside the DlsFlsValveImpl we could add conditional checking against the type of request and resolve a DLS filtered request to a View if one existed. This would likely require a new way to convert between a DLS pattern and a View, but would let users replace the somewhat cumbersone DLS definitons with the more straighforward View representation. For instance, while DLS currently requires a JSON formatted definition for each pattern associated with a role, we could offer View-based definitions as an alternative. As static references, we could then perform resolvable operations on View-based DLS definitions when they are defined.

For example, given two Views (1 and 2), we could provide simple arithmetic definitions for DLS. Because View 1 and View 2 are each objects inside OpenSearch, we could let users reference DLS patterns by saying "in View 1 or View 2" etc. We would also be able to provide administrators the ability to preview a DLS result since we can resolve the Views to their assigned indices at any given point. This would help address some of the more confusing scenarios of DLS where the interaction between multiple DLS-enforced roles is not clear.

Another benefit of this approach is that it would potentially lower the amount of instances where we are forced to interact on Lucene-level objects during DLS. Because this approach has some similarities with the first option, we would again be able to avoid the on-the-fly deletion work we currently perform. If the protected indices were not in any of the allowed Views you would not need to dynamically delete them.

Pros: Helps with user experience, less code changes

Cons: Interacts with the complicated current DLS system, does not address FLS

How would views need to change?

The most liekly requirements for any of the options for reworking DLS/FLS with Views is create new constructs which let OpenSearch track the indices associated with a View more effectively. Looking at the current View implementation, it does not seem there is a great way for accessing the indices inside the View as a batch. Adding an abstraction for ViewableIndex or something similar which was able to have its field names checked against from the ViewService may be helpful.

What is next?

Having overviewed a couple ideas for incorporating Views into an improved DLS/FLS system, I would love to hear community feedback around interest (should we bother?), feature requests around this area (is there something you want addressed I did not mention?), and pitfalls (did I overlook something or am I completely wrong about something?).

Outstanding questions for the community

Here are some questions I still have as the author of this RFC and things I hope the community may be able to help answer.

  • The current implementation is heavily intertwined with Lucene, is this needed?

  • Can we get to point where DLS is performed in SQL-style syntax, and is this desired?

  • Can Views help us resolve permissions before execution of an action and then tell whether a request will pass before it is sent?

  • Can we add negated logic to DLS if we use Views? (Can we append 'must not match View 1', etc.)

@stephen-crawford stephen-crawford added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 6, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Apr 6, 2024
@peternied
Copy link
Member

Great write up on the state of the system. For going forward it sounds like there are a couple of different options, my gut is asking for a proof of concept that can be played with a little to get a sense of the options. What do you think about implementing the a POC of the simplest version of FLS?

Naïvely field level access control doesn't change the query structure "much", maybe it can be implemented easily with query re-writing or a pipelines. Even without the fuller set of feature I think it could be shipped as a valuable improvement to OpenSearch core and create a blueprint for more data visibility features.

@stephen-crawford
Copy link
Contributor Author

stephen-crawford commented Apr 9, 2024

Hi @peternied, thanks for leaving a comment! I guess I should not be surprised since Views it your idea after all! I am all for starting with an example if there is some interest. I do want to clarify what you had in mind for the Fls only implementation. Maybe I am missing something obvious, but I had not considered a way to support Fls only through OpenSearch core.

Most of what I had considered would live more heavily in the Security plugin and likely focus more on Dls which I have a better idea of how to use Views for.

I can definitely add field-level concepts and filtering to the View framework we already have in core, but I want to make sure this is in-line with what you were thinking.

Mind sharing a bit more about what you were thinking?

@peternied peternied added RFC Issues requesting major changes and removed untriaged labels Apr 10, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
@scrawfor99 Thanks for creating this issue, looking forward to seeing how this progresses.

@peternied
Copy link
Member

I'm suggesting starting from a smaller problem space - just talking about document/field level security requires a forward slash that implies complexity.

I would suggest targeting specific scenarios rather than copy functionality directly from the security plugin. In my example, I selected just filtering fields based on rules.

To speak to the complexity on filtering field, in the security plugin there are inclusive and exclusive rules for allowing field matches and field aren't validated. It might be a good design choice to only allow inclusive rules, to prevent newly add fields from leaking unintentionally. Likewise, if the view doesn't have a matching field on the view data, maybe it should throw an exception instead of returning a result that isn't properly filtered, as well as validate them on view modification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Security Project-wide roadmap label Search Search query, autocomplete ...etc
Projects
Status: New
Status: Later (6 months plus)
Development

No branches or pull requests

3 participants