Optimize the Fls/Dls/FieldMasking data structure to only include the concrete indices from the current request #4706

cwperks · 2024-09-04T14:54:16Z

Description

This PR is an optimization for DLS/FLS/Field Masking implementation in the security plugin. Currently, the security plugin computes a data structure called EvaluatedDlsFlsConfig which is a mapping of all concrete indices -> Set dls/fls/fieldmasking rules that are applicable to the concrete index.

The keys in this map are all of the concrete indices that can be resolved to from roles mapped to the user. For instance, if a user is mapped to a role that has index_patterns: '*', then this data structure will always include all concrete indices in the cluster on every request, regardless if the request is only on a small set of indices.

This PR optimizes this to include filtering. It takes the existing data structure, but filters it down to the concrete indices from the request. So if a search request is on a single index, this data structure will have a single key with the Dls/Fls/Field Masking rules that apply to that single index.

Example of the data structure

EvaluatedDlsFlsConfig [dlsQueriesByIndex={}, flsByIndex={fls_index=[~stars], fls_index_2=[~stars]}, fieldMaskingByIndex={}]

Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation)

Enhancement

Issues Resolved

Related to: #4031

Check List

New functionality includes testing
New functionality has been documented
New Roles/Permissions have a corresponding security dashboards plugin PR
API changes companion pull request created
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Craig Perkins <[email protected]>

… not fully resolve to concrete index for datastream Signed-off-by: Craig Perkins <[email protected]>

Signed-off-by: Craig Perkins <[email protected]>

cwperks · 2024-09-04T15:17:27Z

src/main/java/org/opensearch/security/resolver/IndexResolverReplacer.java

@@ -309,7 +309,7 @@ private void resolveIndexPatterns(
                final boolean isDebugEnabled = log.isDebugEnabled();
                try {
                    matchingAllIndices = Arrays.asList(
-                        resolver.concreteIndexNames(state, indicesOptions, localRequestedPatterns.toArray(new String[0]))
+                        resolver.concreteIndexNames(state, indicesOptions, true, localRequestedPatterns.toArray(new String[0]))


This change is needed to ensure that the concrete index corresponding to any data streams in the request is resolved to.

Do we also want to set this as true in line 489/is there a reason this was not passed as true before?

codecov · 2024-09-04T15:39:36Z

Codecov Report

Attention: Patch coverage is 40.00000% with 15 lines in your changes missing coverage. Please review.

Project coverage is 65.56%. Comparing base (6735b5c) to head (d8cf257).
Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
...search/security/configuration/DlsFlsValveImpl.java	38.88%	8 Missing and 3 partials ⚠️
...arch/security/configuration/ClusterInfoHolder.java	20.00%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4706      +/-   ##
==========================================
+ Coverage   65.20%   65.56%   +0.35%     
==========================================
  Files         318      319       +1     
  Lines       22327    22457     +130     
  Branches     3591     3602      +11     
==========================================
+ Hits        14558    14723     +165     
+ Misses       5971     5924      -47     
- Partials     1798     1810      +12

Files with missing lines	Coverage Δ
.../opensearch/security/OpenSearchSecurityPlugin.java	`84.61% <100.00%> (+0.13%)`	⬆️
...earch/security/resolver/IndexResolverReplacer.java	`66.93% <100.00%> (ø)`
...arch/security/configuration/ClusterInfoHolder.java	`57.69% <20.00%> (-8.98%)`	⬇️
...search/security/configuration/DlsFlsValveImpl.java	`60.72% <38.88%> (+0.79%)`	⬆️

... and 16 files with indirect coverage changes

shikharj05 · 2024-09-05T02:28:23Z

Thanks for the changes @cwperks ! I think this should help reduce consumed memory per request.

@nibix in #3870 has proposed a different data-structure (pre-computed) per user and hence this computation would not be required per request. Also, we need to discuss special handling for requests with all indices wildcard - *.

Can you check #3870 and if DLS/FLS changes can be tested? Can we run some micro-benchmark to confirm which is better?

cwperks · 2024-09-05T04:08:42Z

I think the changes in this PR and the pre-computed data structure can both provide performance benefits to the FLS/DLS/Field Masking implementation. The primary gain from this PR is changing the data structure that is set in the ThreadContext headers here. Instead of passing the full data structure (which can contain indices that are not in the request) it sets only the portion of the Fls/Dls/Field Masking data structure that is pertinent to the request.

If we can also avoid computing the full EvaluatedDlsFlsConfig on every request, then that would be another good performance boost.

nibix · 2024-09-05T06:07:06Z

How does this affect requests that do not carry index information (e.g. the scroll request)? So far, I believed we computed permissions for all indices because there are a couple of operations where you cannot tell on the DlsFlsValve level which indices are requested, only on lower levels.

nibix · 2024-09-05T06:26:51Z

src/main/java/org/opensearch/security/configuration/DlsFlsValveImpl.java

+                        threadContext.getHeader(ConfigConstants.OPENDISTRO_SECURITY_FLS_FIELDS_HEADER),
+                        threadContext.getTransient(ConfigConstants.USE_JDK_SERIALIZATION)
+                    );
+                    if (!isSubMap(flsFields, deserializedMap)) {


This is for clusters with mixed OpenSearch versions, right?

I think there's a catch: This will work in the case that a request first hits a node with an old OpenSearch version which then gets forwarded to a node with a newer OpenSearch version.

However, in the reverse case, there will be trouble: If a request first hits an OpenSearch version with this change, this node will calculate the filtered map and pass it on. If the request is then passed on to an older OpenSearch version, it will calculate a different map and throw an exception.

A different perspective: TBH, it is totally unclear to me what's actually the purpose of this equality check. So far, I could not think of any real purpose. I would have the tendency to say that this check could be removed alltogether. (Still, this would not fix the mixed cluster issue directly). That would also help performance as it avoids the back and forth serialization.

Added a check for min node version in. If the cluster has nodes <= 2.17 it will place in the full map as expected. If the min version is 2.18 or beyond it will place the filtered map in as headers.

Signed-off-by: Craig Perkins <[email protected]>

cwperks · 2024-09-05T12:54:08Z

I added a test with scroll, by the time that DlsFlsValveImpl.invoke was called, the action being executed was a SearchRequest.

Here is some of the debugging info I captured from the test:

action: indices:data/read/search
Resolved indices: [fls_index]
expectedLogMessage: Filtered DLS/FLS Config: EvaluatedDlsFlsConfig [dlsQueriesByIndex={}, flsByIndex={fls_index=[~stars]}, fieldMaskingByIndex={}]
All log messages: [DlsFlsValveImpl.invoke()
request: SearchRequest{searchType=QUERY_THEN_FETCH, indices=[fls_index], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=Scroll{keepAlive=1m}, maxConcurrentShardRequests=5, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":1,"query":{"match_all":{"boost":1.0}},"sort":[{"_id":{"order":"asc"}}]}, cancelAfterTimeInterval=null, pipeline=null, phaseTook=null}
evaluatedDlsFlsConfig: EvaluatedDlsFlsConfig [dlsQueriesByIndex={}, flsByIndex={other_index=[~stars], fls_index=[~stars], fls_index_2=[~stars]}, fieldMaskingByIndex={}]
resolved: Resolved [aliases=[], allIndices=[fls_index], types=[*], originalRequested=[fls_index], remoteIndices=[]]
mode: ADAPTIVE, Doing lucene-level DLS because the query does not contain a TLQ, attach FLS info: {fls_index=[~stars]}, Filtered DLS/FLS Config: EvaluatedDlsFlsConfig [dlsQueriesByIndex={}, flsByIndex={fls_index=[~stars]}, fieldMaskingByIndex={}]]

I'm currently looking more into when DlsFlsValveImpl.invoke() is called in the flow.

…fore Signed-off-by: Craig Perkins <[email protected]>

Signed-off-by: Craig Perkins <[email protected]>

cwperks · 2024-10-07T16:50:11Z

Closing in favor of #4380 which introduces a new DLS implementation and will not rely on the current per-request Data Structure for FLS/DLS/Field Masking implementation.

cwperks added 12 commits August 30, 2024 11:16

WIP on FLS and Field Masking optimization

0b9e0ed

Signed-off-by: Craig Perkins <[email protected]>

Keep resolved datastreams because IndexResolverReplacer.Resolved does…

24b642b

… not fully resolve to concrete index for datastream Signed-off-by: Craig Perkins <[email protected]>

Same with DLS

cec16db

Signed-off-by: Craig Perkins <[email protected]>

Remove debug statement

29d8d9c

Signed-off-by: Craig Perkins <[email protected]>

Remove change in ConfigModelV7

7e32bcf

Signed-off-by: Craig Perkins <[email protected]>

Resolve data streams as well

9cd69d1

Signed-off-by: Craig Perkins <[email protected]>

Revert change in test

93a3604

Signed-off-by: Craig Perkins <[email protected]>

Handle null

27282b6

Signed-off-by: Craig Perkins <[email protected]>

Remove sysout

944d26f

Signed-off-by: Craig Perkins <[email protected]>

Merge branch 'main' into filtered-fls

8be5c83

Add FlsAndFieldMaskingLogsTests

f6c51a7

Signed-off-by: Craig Perkins <[email protected]>

Add more tests

d0c84e2

Signed-off-by: Craig Perkins <[email protected]>

cwperks marked this pull request as ready for review September 4, 2024 15:16

cwperks requested review from DarshitChanpura, nibix, peternied, RyanL1997, stephen-crawford, reta and willyborankin as code owners September 4, 2024 15:16

cwperks commented Sep 4, 2024

View reviewed changes

nibix reviewed Sep 5, 2024

View reviewed changes

Add test with scroll

aa8e720

Signed-off-by: Craig Perkins <[email protected]>

cwperks added 2 commits September 16, 2024 14:08

Keep full header map if min node version in the cluster is 2.17 or be…

47e1062

…fore Signed-off-by: Craig Perkins <[email protected]>

Remove subMap check as it does not add much value

d8cf257

Signed-off-by: Craig Perkins <[email protected]>

Merge branch 'main' into filtered-fls

c1574a7

cwperks closed this Oct 7, 2024

cwperks mentioned this pull request Dec 3, 2024

[2.x Backport] Optimized Privilege Evaluation #4898

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the Fls/Dls/FieldMasking data structure to only include the concrete indices from the current request #4706

Optimize the Fls/Dls/FieldMasking data structure to only include the concrete indices from the current request #4706

cwperks commented Sep 4, 2024 •

edited

Loading

cwperks Sep 4, 2024

derek-ho Oct 7, 2024

codecov bot commented Sep 4, 2024 •

edited

Loading

shikharj05 commented Sep 5, 2024

cwperks commented Sep 5, 2024

nibix commented Sep 5, 2024

nibix Sep 5, 2024 •

edited

Loading

cwperks Sep 16, 2024

cwperks commented Sep 5, 2024 •

edited

Loading

cwperks commented Oct 7, 2024

Optimize the Fls/Dls/FieldMasking data structure to only include the concrete indices from the current request #4706

Optimize the Fls/Dls/FieldMasking data structure to only include the concrete indices from the current request #4706

Conversation

cwperks commented Sep 4, 2024 • edited Loading

Description

Issues Resolved

Check List

cwperks Sep 4, 2024

Choose a reason for hiding this comment

derek-ho Oct 7, 2024

Choose a reason for hiding this comment

codecov bot commented Sep 4, 2024 • edited Loading

Codecov Report

shikharj05 commented Sep 5, 2024

cwperks commented Sep 5, 2024

nibix commented Sep 5, 2024

nibix Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

cwperks Sep 16, 2024

Choose a reason for hiding this comment

cwperks commented Sep 5, 2024 • edited Loading

cwperks commented Oct 7, 2024

cwperks commented Sep 4, 2024 •

edited

Loading

codecov bot commented Sep 4, 2024 •

edited

Loading

nibix Sep 5, 2024 •

edited

Loading

cwperks commented Sep 5, 2024 •

edited

Loading