-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add request labeling service/framework #14388
base: main
Are you sure you want to change the base?
add request labeling service/framework #14388
Conversation
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
❌ Gradle check result for 4c3bdd7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@msfroh @sohami @jainankitk @reta |
* this class will facilitate access and interactions to different implementations | ||
* Usage: This class should be used as a member to orchestrate the working of your {@link LabelingPlugin} | ||
*/ | ||
public class LabelingService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still recommend adding the request
keyword in all the related names, cause essentially what we are doing is adding labels to requests (search for now, but bulk, index etc in the future). Maybe RequestLabelingService
?
/** | ||
* Enum to define what all are currently implementing the plugin | ||
*/ | ||
public enum LabelingImplementationType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to a separate file? Also are we suggesting a 1-1 mapping here? (a plugin can only be used for one certain purpose and one implementation can only be mapped to 1 plugin) Can multiple plugin have QUERY_GROUP_RESOURCE_MANAGEMENT
type? If not, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want this to be present since invoke triggers for feature might not be uniform, e,g; for QueryGrouping We will need to label the request at two places i,e; at co-ordinator request and at shard request.
Given this will fall on the hot path it will be redundant re-compute the labels for all of the implementations
Regarding whether there should be a 1:1 mapping between feature and Implementation.
Do you think it makes sense to utilise the same implementation across multiple features ? I am assuming each feature could be specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it make sense to allow each feature to have their specific implementation.
public LabelingService(List<LabelingPlugin> loadedPlugins) { | ||
implementations = new EnumMap<>(LabelingImplementationType.class); | ||
for (LabelingPlugin plugin : loadedPlugins) { | ||
if (implementations.containsKey(plugin.getImplementationName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might introduce some confusion and ineffiency when implementing a new LabelingPlugin
, in theory, if I'm developing a new LabelingPlugin
, I shouldn't be worried about if someone else has already implemented a plugin for the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think so, can you elaborate more with help of examples ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if in your feature/plugin A, you alreayd have a LabelingImplementationType1
implemetation and you have already registered with the LabelingService
. When I'm implementing my feature/plugin B, if I also want to have a LabelingImplementationType1
, but I don't know there's already a LabelingImplementationType1
exists, there might be possibilities your implementation will override my implementation in prod but I won't be able to capture that during development.
if (plugin == null) { | ||
throw new IllegalArgumentException(type + " implementation is not enabled"); | ||
} | ||
plugin.labelRequest(request, threadContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still in favor of having multiple plugins labeling a requests at the same time like https://github.com/opensearch-project/OpenSearch/pull/14282/files#diff-f02ca59c8d99a70fdc277b1b311ba8fc0135920ef5a114578187e9388e922f6bR60,
Can we define something like a "order/priority" that a plugin can apply the labels like we did in the ActionFilter
:
int order(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are thinking of is I think from a different perspective, I think you are assuming that all of the consuming feature needs are uniform but they are not as I had explained it above for QueryGrouping feature.
So rather than push-down-to-syn mechanism to apply this event. I am more inclined towards pull-to-sync mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, could you also give some examples on where this function will be called? if we are using this on demand, we might need to pass labelingService
to all plugins that need it? possibly by adding it as a default constructor parameter for LabelingPlugin
?
* @param request | ||
* @param threadContext | ||
*/ | ||
void labelRequest(final IndicesRequest request, final ThreadContext threadContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to create a new plugin for the query grouping feature?
public class QueryGroupPlugin extends Plugin implements LabelingPlugin
{
...
}
Or are we planning to reuse any existing ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't concern the specific feature. Yes, This feature is supposed to be plugin driven to maintain segregation and strong cohesion with the features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered trying to use the ActionPlugin.getActionFilters extension point? This allows you to run a filter before a transport action is executed on a node.
Here's an example of how Security overrides it and registers a SecurityFilter. The filter needs to implement a method called apply that has access to the Request object. You can also instantiate a filter within createComponents
where you have access to the threadPool
that core passes to plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 this is the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!, I missed this comment. This could also work
@cwperks Will the threadContext have the authN/authZ info ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the following code to sort the filters based on the order()
method implementation
public ActionFilters(Set<ActionFilter> actionFilters) {
this.filters = actionFilters.toArray(new ActionFilter[0]);
Arrays.sort(filters, new Comparator<ActionFilter>() {
@Override
public int compare(ActionFilter o1, ActionFilter o2) {
return Integer.compare(o1.order(), o2.order());
}
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security must be the first action filter to run since that is where authz is performed and that is also where the authenticated user info is serialized to the thread context
Security uses RequestHandlerWrapper
if I am not mistaken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta Authentication is done in the RestHandlerWrapper and authorization is done as an action filter.
Since 2.11, authentication has been moved further up the netty pipeline into the header_verifier step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaushalmahi12 Security reserves the lowest order (first)
@Override
public int order() {
return Integer.MIN_VALUE;
}
Any other filter should have a higher value to occur after the SecurityFilter
The SecurityFilter calls on PrivilegesEvaluator.evaluate which is the method in which the authenticated user is serialized into the ThrreadContext. This is the line in PrivilegesEvaluator.evaluate where setUserInfoIntoThreadContext is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/** | ||
* populates the threadContext with the labels yielded by the {@param type} against the {@link LabelingHeader} keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are delegating the logic to "add labels into thread context" to plugin implementation, which IMO is a little bit dangerous. Plugins can add their arbitrary labels into thread context without complying with the LabelingHeader
enum. I think we should have the logic to check/validate and put labels into thread context in this service, plugins should only compute the labels they want to add. Like: https://github.com/opensearch-project/OpenSearch/pull/14282/files#diff-f02ca59c8d99a70fdc277b1b311ba8fc0135920ef5a114578187e9388e922f6bR66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Point! I agree with this, I think with return types we can easily enforce the header types as key.
@kaushalmahi12 @ansjcy I honestly don't understand the design decisions of this change:
We have all the pieces already (I believe), they just have to be linked together. |
@reta yes, we can utilize So, there can be two different approaches to do rule-based labeling, one is let the plugins expose their rules and the core service apply the rules for each request if possible (#14282). Another approach is using plugin interface to make it more generic, especially for Query Grouping features use cases (@kaushalmahi12 can explain more about the advantages here). |
@ansjcy my point is - there are all the tools in the disposable, there is not need to build framework or alike in core (in my opinion), it is sufficient to have separate plugin (labeling-plugin or tagging-plugin, you name it) that would do labels based on rules or other criteria. |
@reta Thanks for providing detailed response. I agree that existing constructs might be enough for majority of the use cases but it will not suffice for something which want to classify/tag requests at co-ordinator and data node level. For QueryGrouping (Query Sandboxing previously) we need to label the request at both levels for resource tracking and enforcement purposes. I see that the ActionPlugin::getRestHandlerWrapper might do the job for co-ordinator requests but Based on the javadoc of this method only one installed plugin can implement this.
The implementation here should ideally perform authN/authZ I think, ref: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L666. |
@kaushalmahi12 if I am not mistaken, the headers are propagated from REST (coordinator) to transport layer and we support that, what is missing here to classify/tag requests at co-ordinator and data node level?
I completely missed that part, thank you for correcting me, the
|
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.querygroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the wlm as package name. QueryGroup as an independent entity is not enough to warrant a plugin/top level package
/** | ||
* Main enum define various headers introduced by {@link org.opensearch.plugins.LabelingPlugin}s | ||
*/ | ||
public enum LabelingHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @ansjcy mentioned elsewhere, might be good to add Request
prefix to class names
* Main enum define various headers introduced by {@link org.opensearch.plugins.LabelingPlugin}s | ||
*/ | ||
public enum LabelingHeader { | ||
QUERY_GROUP_ID("queryGroupId"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this enum should be part of core as it is used by the RequestLabelingService
, but it has plugin implementation specific values. Hence, I am slightly confused whether this should be in core or part of plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be part of core since these headers will be consumed in core.
* Enum to define what all are currently implementing the plugin | ||
*/ | ||
public enum LabelingImplementationType { | ||
QUERY_GROUP_RESOURCE_MANAGEMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again if LabelingImplementationType
class is part of core, does the plugin developer need to modify the core for adding their specific value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This construct is little debatable, and probably need more thought. Let me think a bit about this
@kaushalmahi12 - It seems we are doing too much in single PR which is probably adding to the confusion. Ideally there should be two separate PRs, one adding the support for labeling extensions within core, second one demonstrating the usage of LabelingService from |
@reta Yes! I was also thinking of using IMO This approach provides more flexibility on how and when a plugin can label/tag the requests. |
I think so, it looks right for a job
We do not provide any ordering guarantees and we could not fulfill this requirement fairly. Why do you need ordering? |
@reta We will need the ordering because we want the authN/authZ information present in the context. That is why we want this logic to be executed after the authN/authZ filter. |
@kaushalmahi12 the use case does not feel right, we have security plugin in place to handle such cases and it uses the proper abstraction |
@kaushalmahi12 - I don't think we should enforce the ordering assumption between plugins.
This is a limitation of
@reta - Labeling logic should not be part of security plugin IMO, and others plugins might not have access to the security context. Do you have any suggestions other than building this in core? |
@kaushalmahi12 as per my understanding, we enrich request with the headers (tags) which would be propagated to thread context (using
@jainankitk I don't understand the dependency on authZ/authN and need for ordering here: we run action filters one by one so the end result should be the (thread) context populated with all necessary data before the request is processed on coordinator. |
@reta - The labeling logic will be based on the security context. For example - if the principal is Joe, the tenant is A. If the principal is Dave, tenant is B.
As mentioned above, the labeling action filter will need the principal information provided by the security plugin for populating the tenant header. |
@jainankitk here is my understanding of the flow:
If I am not mistaken, the |
This PR is stalled because it has been open for 30 days with no activity. |
Description
This PR provides the foundational support to add multitenant labels to
ThreadContext
. This foundation provides a way to implement theLabelingPlugin
to define how to compute these labels using implicit/explicit attributes.Related Issues
Resolves #13341
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.