Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Cloud Native SQL Plugin #13274

Closed
fddattal opened this issue Apr 18, 2024 · 53 comments
Closed

[RFC] Cloud Native SQL Plugin #13274

fddattal opened this issue Apr 18, 2024 · 53 comments
Labels
enhancement Enhancement or improvement to existing feature or request extensions Plugins RFC Issues requesting major changes Roadmap:Modular Architecture Project-wide roadmap label

Comments

@fddattal
Copy link

Is your feature request related to a problem? Please describe

I want to run the SQL plugin in a cloud native environment, but I am unable to do so without forking the code.

Today, plugin execution is tightly coupled to OpenSearch core. Access to metadata of index, aliases, templates, node roles, and routing is tightly coupled with cluster state. Furthermore, plugins are vended full access to OpenSearch core including direct access to cluster state internals. This make it difficult to evolve plugin APIs to be more cloud native provider friendly, and eventually run plugins in a stateless mode.

Describe the solution you'd like

I would like to propose an interface to plugins for accessing OpenSearch core. The interface should expose the minimal fine grained state management operations and the operations needed to support plugins. This will ensure both cluster and cloud native implementation can support the plugin uniformly.

Then the SQL plugin will migrate to this interface and be enabled to run in cloud native environments.

Related component

Plugins

Describe alternatives you've considered

No response

Additional context

FAQ

How would the community benefit from this?

  1. The work here is foundational work which is applicable to other plugins. The plugin interface can be expanded upon to enable other plugins them to become cloud native.
  2. This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface. Having this interface enables:
    2.a. The possibility for alternative implementations of core APIs.
    2.b. Improve stability guarantees of the plugin APIs.
    2.c. Complete audibility of plugin’s access to core APIs via this interface.
    2.d. The enabling of alternative runtime environments to run plugins unmodified.

What other RFCs does this align with?

  1. [RFC] Refactor Metadata management in Cluster Manager code as separate lib #13197
    1.a. This work adds upon the RFC proposed for cluster state management. Similar to how cluster state management is being refactored to enable cloud native environments, so will plugin to core access.
  2. [META] Split and modularize the server monolith #8110
    2.a. In addition to code separation, a standard for plugin to core interaction is now defined to make core module implementations swappable.
  3. [RFC] Path to Deprecating the Use of Thread Context for Security Information security#2860
    3.a. In a future where all plugin to core access is done via the interface. Security negotiation can be mediated by the interface and eventually lead to the deprecation of the mutable thread context for security management.
@fddattal fddattal added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 18, 2024
@saratvemulapalli saratvemulapalli added the RFC Issues requesting major changes label Apr 18, 2024
@dblock
Copy link
Member

dblock commented Apr 18, 2024

I like this proposal because it's effectively identical to https://github.com/opensearch-project/opensearch-sdk-java except that you are offering to refactor core vs. exposing a new interface on top of core. Generally, refactoring core is a costly undertaking, everything takes 10x more work. We're seeing that even relatively small changes to enable things like #10684 require weeks of iterating, just to get gradle checks to pass. You also will need to deal with things like settings, therefore the proposed plugin interface should include that (and possibly other things).

Wouldn't it be easier to make https://github.com/opensearch-project/opensearch-sdk-java the plugin interface you're proposing and implementing the SQL plugin on top of that? If you shortcut/remove/bypass all the remote aspects of that SDK, you effectively have a clean plugin interface, and you can iterate on that completely independently from core.

@dbwiddis
Copy link
Member

@dblock's comments about SDK prompted me to think of the fact that we really wanted to implement those using OpenSearch Java Client. That continues on to the whole conversation around generated code and reference specifications.

Which brings me to the question: do we need to write these interfaces or can we generate them from the OpenAPI spec?

@dblock
Copy link
Member

dblock commented Apr 22, 2024

Which brings me to the question: do we need to write these interfaces or can we generate them from the OpenAPI spec?

I think we can. It's a ton of boilerplate. That project has moved ahead quite well!

https://github.com/opensearch-project/opensearch-api-specification

@Xtansia is working on a java generator from API, it could generate anything

@fddattal
Copy link
Author

Hey folks! Thanks for the comments!

To summarize my understanding:

@dblock's feedback is that we should use the opensearch-sdk-java as our target for a cloud native sql plugin.
This helps in two ways:

  1. It avoids needing to refactor core allowing for faster iteration time
  2. We would end up defining an interface which is already similar to the SDK

@dbwiddis' feedback is that we should generate the core api interface using the API spec.
This helps to avoid manually maintaining the java code for interfacing with core.


I've taken a look at the opensearch-sdk-java and I do believe this would offer us a path
forward if we invest a bit more into it. There are two main challenges which I see:

  1. Today there's no in-process version of Extensions. I'd like to run the code in the same
    process to avoid the performance overhead of networking. We would need to prioritize this
    issue [PROPOSAL - IN PROC Support] Run a process within the same JVM opensearch-sdk-java#688

  2. The SDKClient which the opensearch-sdk-java exposes to Extensions is a concrete class which
    leaves little flexibility for myself to inject custom cloud native behavior. An interface
    would allow me that flexibility. Similar to the client generation, we could also generate this
    interface and completely rely on the spec for its definition.


In a similar vein, a solution which I have considered is migrating the SQL plugin to the
Client interface rather than the concrete NodeClient. This would enable me to inject custom logic
to my Client subclass to support the plugin.

I see this as a similar approach to the opensearch-sdk-java as both solutions are API based and
would allow for functionality being swapped at an API action level.

Given that this Client is already in core and exposed to plugins, I would not expect additional
core refactoring to enable this solution (except for below).

@dblock @saratvemulapalli I want to understand another point: I see in the opensearch-sdk-java that the
Client interface is intended for deprecation. Could you help me understand why it's being considered
for deprecation? What are its downsides? I assume the replacement would be a generated interface/client?


To add to the API discussion. I've been having discussions with @shwetathareja on decomposing cluster
state access in core, and we feel that the existing public API's are not sufficiently fine-grained
for accessing cluster state components.

To be compatible with her RFC here: #13197 we would
need to introduce or extend the existing cluster state API so that we may access the
following (not an exhaustive list, this is just for SQL):

  1. Individual indices metadata
  2. Individual index settings
  3. Individual cluster setting values

This is fine grained access is particularly a problem for cluster state because

  1. Cluster state can be large. Thus, accessing these APIs, even locally, can have a
    performance overhead
  2. The full cluster state may not be supported in cloud native
    environments (eg: routing table may not be supported)

These APIs changes would be applicable to both SDK and Client based approaches.


Looking forward to hearing your thoughts!

@dblock
Copy link
Member

dblock commented Apr 23, 2024

Thanks for keeping an open mind @fddattal!

In picking similar-ish solutions in terms of complexity, risk and cost I would always take the one that's viable most long term.

Today there's no in-process version of Extensions.

Correct. There's probably a very minimal solution that assumes that both sides are the same version of the message being sent and passes it in-memory rather than through serialization/deserialization.

@dblock @saratvemulapalli I want to understand another point: I see in the opensearch-sdk-java that the
Client interface is intended for deprecation. Could you help me understand why it's being considered
for deprecation? What are its downsides? I assume the replacement would be a generated interface/client?

I think you are looking for opensearch-project/opensearch-sdk-java#732 that's all @dbwiddis.

@fddattal
Copy link
Author

Hey folks! I've taken a look at #13336 and I think that there's a shared solution that would position us well in the long term for a full extension migration.


Proposal

What I am proposing is the following:

  1. We introduce a new client interface which plugins and extensions will use to interact with core.
  2. The interface itself can be maintained as a separate library or module.
  3. Data exchange over the interface is done using simple Java Pojo's.
  4. The interface is asynchronous.
  5. The library would maintain all the Pojo's for its Request/Response objects.
  6. OpenSearch and cloud native environments may provide alternative implementations of this interface to suit their needs.

Tenets

  1. The API definition should not depend on core
  2. The API should function using message passing "as-if" executing remotely
  3. Internal communication details are abstracted by the interface

Interface Definition

package org.opensearch.sdk;

import org.opensearch.sdk.model.ClearScrollRequest;
import org.opensearch.sdk.model.ClearScrollResponse;
import org.opensearch.sdk.model.DeleteCustomRequest;
import org.opensearch.sdk.model.DeleteCustomResponse;
import org.opensearch.sdk.model.GetCustomRequest;
import org.opensearch.sdk.model.GetCustomResponse;
import org.opensearch.sdk.model.GetIndexMappingsRequest;
import org.opensearch.sdk.model.GetIndexMappingsResponse;
import org.opensearch.sdk.model.GetIndexSettingRequest;
import org.opensearch.sdk.model.GetIndexSettingResponse;
import org.opensearch.sdk.model.GetSystemSettingRequest;
import org.opensearch.sdk.model.GetSystemSettingResponse;
import org.opensearch.sdk.model.MultiSearchRequest;
import org.opensearch.sdk.model.MultiSearchResponse;
import org.opensearch.sdk.model.PutCustomRequest;
import org.opensearch.sdk.model.PutCustomResponse;
import org.opensearch.sdk.model.ResolveIndicesAndAliasesRequest;
import org.opensearch.sdk.model.ResolveIndicesAndAliasesResponse;
import org.opensearch.sdk.model.ScrollRequest;
import org.opensearch.sdk.model.SearchRequest;
import org.opensearch.sdk.model.SearchResponse;

import java.util.concurrent.CompletionStage;

public interface Client {

    // for sql plugin

    CompletionStage<SearchResponse> search(SearchRequest request);

    CompletionStage<SearchResponse> scroll(ScrollRequest request);

    CompletionStage<ClearScrollResponse> clearScroll(ClearScrollRequest request);

    CompletionStage<GetIndexMappingsResponse> getIndexMappings(GetIndexMappingsRequest request);

    CompletionStage<GetIndexSettingResponse> getIndexSetting(GetIndexSettingRequest request);

    CompletionStage<GetSystemSettingResponse> getSystemSetting(GetSystemSettingRequest request);

    CompletionStage<ResolveIndicesAndAliasesResponse> resolveIndicesAndAliases(ResolveIndicesAndAliasesRequest request);

    CompletionStage<MultiSearchResponse> multiSearch(MultiSearchRequest request);

    // for data store interface

    CompletionStage<PutCustomResponse> putCustom(PutCustomRequest request);

    CompletionStage<GetCustomResponse> getCustom(GetCustomRequest request);

    CompletionStage<DeleteCustomResponse> deleteCustom(DeleteCustomRequest request);
}

Pros / Cons

Pros:

  1. This proposal gives us the flexibility for building a solution that would work for OpenSearch and cloud native envrionments
  2. It is aligned with efforts to decompose cluster state - [RFC] Refactor Metadata management in Cluster Manager code as separate lib #13197
  3. The interface can be maintained outside of the core allowing us to develop it independently

Cons:

  1. This will introduce a competing standard and require plugin migration
  2. Additional effort would be needed to enhance the interface to expose new core capabilities.
    1. Albiet this can be seen as a good thing because it would be a forcing function for API evolution stewardship
  3. The implementers of the interface (core, cloud native) would need to perform object transformation to convert from the API Pojo to their internal core representations

What's Not Addressed Here

  1. Versioning - The API itself is not currently versioned, but the API could be extended to add versioning at either the client or api action level.
  2. Cross-plugin/module calls - Currently there's no way for a plugin or module to expose api actions for other plugins or modules to consume. We could expose a mechanism to register api action handlers with the client and execute those in a way similar to org.opensearch.Client#execute(ActionType, Request, ActionListener).

@dblock
Copy link
Member

dblock commented Apr 25, 2024

@fddattal I like this. Do you think it makes sense to reuse https://github.com/opensearch-project/opensearch-sdk-java for that and possibly produce multiple artifacts (that eventually merge)?

@fddattal
Copy link
Author

@dblock I like that approach and it's certainly feasible to do so. We would modify the SDK package to be a multi-module gradle project, and the existing extension code would become one module, while the API would be another. Each releasing their own artifacts.

@fddattal
Copy link
Author

fddattal commented Apr 26, 2024

I've put together a POC for this approach here:

  1. https://github.com/fddattal/opensearch-sdk-java
  2. https://github.com/fddattal/OpenSearch/tree/sdk_in_core_2_12_v2
  3. https://github.com/fddattal/sql/tree/sql_on_sdk_2_12_0

I was able to:

  1. Refactor the opensearch-sdk-java and consume it in sql and implement the new api interface in core.
  2. Refactor enough of the SQL core and implement the SDK Client API to get a simple SQL select working
Screenshot 2024-04-26 at 5 13 43 PM

I found I spent a lot of time creating the model objects and writing the code to convert to/from the model and core data structures. I think having some standard tooling in the API to aide with this would help drive adoption.

@shwetathareja
Copy link
Member

Data exchange over the interface is done using simple Java Pojo's.

@fddattal : one clarification, these Pojo are re-using existing OpenSearch cluster state sub objects/ data structures or altogether new Pojo's

@fddattal
Copy link
Author

one clarification, these Pojo are re-using existing OpenSearch cluster state sub objects/ data structures or altogether new Pojo's

@shwetathareja These Pojos are new altogether and would live outside of core in the SDK package.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@fddattal Thanks for creating this RFC

@fddattal
Copy link
Author

fddattal commented May 1, 2024

Next Steps

Hey folks, I don't see any blocking concerns so I am moving ahead with a full implementation.

Below is what to expect regarding high level changes. Please let me know if you have any concerns!

Investigation Work

  1. Investigate the level of effort required to fully refactor SQL to only use the SDK Pojo's throughout versus translating to SDK Pojo's at the API layer. Today the plugin is using core's objects which may contain features not supported by the SDK APIs. Depending on how this goes it would inform us on whether we need to add a generic request/response transformation layer in the SDK API now versus simply have the sql plugin migrate. A generic request transformation layer would be needed later to enable plugin-to-plugin interactions via the org.opensearch.sdk.Client.

OpenSearch Java SDK

  1. Package will be refactored as a gradle multi-module project
  2. New module opensearch-java-sdk-api will be introduced to contain the o.o.sdk.Client and associated Request/Response objects
  3. [Contingent on Investigation 1 finding reader/writer interface is needed] Add a token based reader/writer interface similar to the xcontent API in core. All Request/Response objects would support interfacing with these token streams.

SQL Plugin

  1. SQL plugin will access the the o.o.sdk.Client dependency exposed transitively through core.
  2. [Contingent on Investigation 1 finding reader/writer interface is not needed] Refactor all read only paths to use the sdk.Client and Pojo's entirely
  3. Cloud native compatible SQL integ tests will be renamed from *IT.java to *CloudNativeIT.java
  4. A new remote cluster test gradle task will be added to run all *CloudNativeIT.java's

Core

  1. Core :server will consume the org.opensearch.sdk:opensearch-java-sdk-api as an api dependency
  2. A new parameter and overloaded method will be added to Plugin.java to recieve a concrete org.opensearch.sdk.Client implementation
  3. Core :server will maintain a private implementation of org.opensearch.sdk.Client
  4. Updates to the test framework will be made so that cloud native providers can inject infrastructure provisioning steps during integ tests
  5. Updates to the test framework to allow core to inject custom RestClient implementations

@shwetathareja
Copy link
Member

Core :server will consume the org.opensearch.sdk:opensearch-java-sdk-api as an api dependency

@fddattal can you elaborate why :server module would need this dependency?

@shwetathareja These Pojos are new altogether and would live outside of core in the SDK package.

How are you going to generate these Pojo? Using OpenAPI specification? What would be the performance overhead of translating across objects?

@fddattal
Copy link
Author

fddattal commented May 2, 2024

Thanks for your comments @shwetathareja !

can you elaborate why :server module would need this dependency?

Core :server needs to provide an implementation of the o.o.sdk.Client interface to plugins. An instance of this class would need to be injected into plugins during their initialization in Node.java.

How are you going to generate these Pojo? Using OpenAPI specification?

For now, I plan to hand write the Java classes by hand and use Lombok to generate the boilerplate. We certainly could use a spec to generate the API in the future as more target runtimes are supported.

What would be the performance overhead of translating across objects?

I've benchmarked (on my laptop) translating a large search request with 10k boolean clauses and a large search response with 10k hits and found the request took 0.03 ms and response took 0.1 ms. If performance overhead due to object translation becomes a problem we could introduce interfaces in the API which map 1:1 with their corresponding core objects and have the core object implement those interfaces. For such implementations, the translation would be a noop.

Benchmark                                                Mode  Cnt       Score       Error  Units
SDKTranslatorBenchmark.testTranslateLargeSearchRequest   avgt    5   38929.944 ±   456.177  ns/op
SDKTranslatorBenchmark.testTranslateLargeSearchResponse  avgt    5  114267.749 ± 10530.752  ns/op

@owaiskazi19
Copy link
Member

Sorry to chime late. @fddattal since the RFC is around running SQL independently than core. And one of option provided was opensearch-sdk-java. 2 years back when we started working on sdk, SQL was the first plugin I integrated with sdk to run independently. Commit history of my fork for any help around.

@dblock
Copy link
Member

dblock commented May 3, 2024

@owaiskazi19 whoa I totally forgot about that one! full circle

@andrross
Copy link
Member

andrross commented May 5, 2024

SQL plugin will access the the o.o.sdk.Client dependency exposed transitively through core.

@fddattal I'm assuming "core" here means the :server module. Correct me if I'm wrong. But isn't the whole point here to break the dependency that the plugin has on the :server module?

@fddattal
Copy link
Author

fddattal commented May 6, 2024

Thanks for your comment @andrross !

Correct me if I'm wrong. But isn't the whole point here to break the dependency that the plugin has on the :server module?

Yes it's true that "core" means the :server in this context.

I'd state our goal slightly differently - Refactor the read-only paths in SQL plugin which access core to one unifying cloud native interface. In doing so we would remove deep core access channels such as direct cluster state access and unnecessary core details being exposed to the plugin.

The steps taken here will help us to fully remove the core dependency in the future. However, the work will not be sufficient to do so because both non-read-only paths and accesses from core to plugin will not be refactored.

@andrross
Copy link
Member

andrross commented May 6, 2024

I'd state our goal slightly differently - Refactor the read-only paths in SQL plugin which access core to one unifying cloud native interface. In doing so we would remove deep core access channels such as direct cluster state access and unnecessary core details being exposed to the plugin.

@fddattal We currently have the org.opensearch.plugins java namespace in the :server module (with the main entry point being org.opensearch.plugins.Plugin), which on the surface looks like a nice clean interface for extending the functionality of OpenSearch. However, the problem, as has been stated many times, is that many very low level details of the server are exposed through this interface resulting in very tight coupling. With opensearch-java-sdk-api and o.o.sdk.Client will you essentially be implementing a "version 2" of this Plugin interface that doesn't expose any :server dependencies? Are there additional complexities here that I'm missing?

@saratvemulapalli
Copy link
Member

Thanks @fddattal for the RFC.
I've worked through extensions[1] which was essentially de-coupled the compute from OpenSearch but still relied on internal implementations of opensearch :server.
I see this proposal as plugins will be de-coupled from internal implementations of core and yet access the information they'd like.
I love the proposal, essentially see it as PluginsV2 (I agree with @andrross).

Here are my thoughts and proposal:

Tenets (in addition to the ones listed by @fddattal)

  • Interface abstracts inner implementations (including Pojo's) of core (:server)
  • The interface exposed to clients does the heavy lifting for sync and async implementations.
  • Core (:server) would have a default implementation in min distribution.

I'd like this interface to de-couple plugins in the following verticals, and deliver in phases:

  1. Decouple In Memory Store (a.k.a cluster state): As we move the cluster management to Cloud Native[2], plugins should be agnostic of local vs remote implementations of cluster state, which this RFC already covers and your usecase for SQL.
  2. Decouple plugin storage: Plugins rely on OpenSearch to store metadata in indices for persistence. This configuration doesn’t have to be in opensearch index. @dbwiddis's RFC[3] proposes an abstraction with alternative cloud native storage.
  3. Decouple plugin configuration: Plugins rely on OpenSearch settings[4] to expose knobs for users to configure features. Plugins should be agnostic of local vs remote implementations of Cluster settings.
  4. Decouple plugin security and tenancy: Plugins use OpenSearch security plugin and implement custom authorization mechanisms[5]. I'd like an interface which offloads AuthZ to a central authority. Also tangentially I am seeing use cases of plugins trying to support multi-tenancy[6], which could be solved.

I would want to deliver these verticals in phases and incrementally. Eventually once we have adoption we can explore the options of deprecating existing access's to the core :server

I would like to have the interface defined in OpenSearch repo as a library and default implementation in :server.
Additionally plugins will take a dependency on the library. I do not mind calling it org.opensearch.plugins.Pluginv2 or org.opensearch.sdk.

Few things I'd like to push it down the road:

@dbwiddis' feedback is that we should generate the core api interface using the API spec.
This helps to avoid manually maintaining the java code for interfacing with core.

  1. I would not prioritize API generation until the interface is widely adopted and we are seeing pains of manually maintaining it. With the experience of Plugin interface[6], it really hasn't changed a lot for a long time.
  2. Extensions are another way of extending features of OpenSearch but for phased and incremental approach I'd like the interface to be flexible enough to support extensions in future.

I would also like to get feedback from @reta @AmiStrn @samuel-oci @rursprung @abseth-amzn who worked in the area of plugins.

[1] #2447
[2] #13197
[3] #13336
[4] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/settings/Settings.java
[5] opensearch-project/security#1895
[6] opensearch-project/ml-commons#2358

@dblock
Copy link
Member

dblock commented May 7, 2024

With opensearch-java-sdk-api and o.o.sdk.Client will you essentially be implementing a "version 2" of this Plugin interface that doesn't expose any :server dependencies? Are there additional complexities here that I'm missing?

I think that's it. We put it in a different repo (a facade) to reduce blast radius when API/implementation/internals change in core. Instead of having to figure out how to fix N plugins we fix the SDK, which itself then depends on a stable version of core rather than on the next moving version of core.

@andrross
Copy link
Member

andrross commented May 7, 2024

@saratvemulapalli I think we're aligned and I really like how you've broken much of this work down. A couple points to follow up on though:

I would like to have the interface defined in OpenSearch repo as a library and default implementation in :server

I agree with this. Probably something like a new library in libs/pluginv2 or libs/plugin-api (or whatever we want to name it). I think this creates the groundwork for integration with JPMS where we can explicitly declare what gets publicly exported by these libraries.

Interface abstracts inner implementations (including Pojo's) of core (:server)

Just to clarify, as we build on the vision and work started in #5910 we should be moving many parts of the code in :server into a respective library (like libs/common and libs/core) and that would be fair game for the new plugin API to take a dependency (provided the library is exporting the class as a public API). This would avoid duplicating and translating POJOs for things like the Java object representation of the query DSL. Is this in line with your thinking as well?

@fddattal
Copy link
Author

fddattal commented May 9, 2024

The use of the Custom class is a bit confusing here, as I see two different imports for the Custom class related to Cluster State (which I assume is where you got the name), one as a subclass of ClusterState and one as a subclass of Metadata. Both inherit from ToXContentFragment among others.

@dbwiddis That's a good catch! I think its in our best interest to define two separate APIs for modifying index level and cluster level customs. What are your thoughts?

@dbwiddis
Copy link
Member

dbwiddis commented May 10, 2024

To tackle the direct issue of ClusterService usage, I propose the following:

@fddattal I've already started down the road of implementing against the proposed CompletionStage style here but now I see you're proposing an ActionFuture based implementation with specific arguments closer to my proposal in #13336.

I can work with either and am focusing on the lower-level implementations first, but I'd really like us to align on an interface style sooner rather than later. I thought we had.

The CompletionStage variant encapsulates arguments inside the Request and return values inside the Response classes, which is much more future-proof. For example, I initially added just "Index" and "id" strings to my GetCustomRequest implementation and later realized I needed FetchSourceContext. This was an easy addition to the request object and did not change the implementation in the code. (I'm currently only using one, but there will soon be dozens.)

The interface is also completely independent of core, as it uses base Java functionality

Your new proposal requires changing method arguments for the interface if an additional field is required. I don't like this at all.

I strongly prefer the request/response object model. I'm flexible on including ActionListeners and there's some benefit to PlainActionFuture which we don't get currently with CompletableFuture/CompletionStage, but that's a relatively easy fix if we want to make a PlainActionCompletableFuture compatible with the CompletionStage style.

Can we align on an approach?

@sam-herman
Copy link
Contributor

Sorry to chime in late, I agree with @reta we shouldn't be doing pluginsV2, there are already too many things in place. In particular I would like to understand better what kind of specific problem this suggestion is solving.
@fddattal can you please elaborate to explain the problem you encountered and why this solves it?

@andrross andrross added the Roadmap:Modular Architecture Project-wide roadmap label label May 14, 2024
@fddattal
Copy link
Author

I would like to understand better what kind of specific problem this suggestion is solving. @fddattal can you please elaborate to explain the problem you encountered and why this solves it?

Absolutely @samuel-oci! The particular problem that I have comes down to the fact that I deploy a custom distribution of OpenSearch. In my distribution, cluster based components don't exist. So, all code which depends on those direct cluster object do not work.

For example the following would not work because accessing clusterService.state() does not work in my environment.

MappingMetadata mappings = clusterService.state().metadata().index("foo").mapping();

So what I am proposing is we introduce an interface to abstract these core details to allow plugins to run
without needing knowledge of the distribution they're running on top of.

Taking the same example, you can imagine an interface that exposes an operation:

CompletionStage<GetMappingMetadataResponse> getMappingMetadata(GetMappingMetadataRequest);

// request/response objects

class GetMappingMetadataRequest {
  String index;
}

class GetMappingMetadataResponse {
  MappingMetadata metadata;
}

In the standard OpenSearch distribution, this interface would just thinly wrap the existing access pattern via cluster service.

However, if I had this interface, then I would cleanly be able to stub out its implementation so that it works in my custom distribution and the plugin would then be able to run without a fork.

The full extend of what this interface would need to support for the SQL plugin is detailed in this comment: #13274 (comment)

OpenSearch core would likewise benefit in having this interface because of the greater degree of decoupling which it provides.


I strongly prefer the request/response object model. ... Can we align on an approach?

Agreed @dbwiddis ! I am also inclined to prefer CompletionStage's and the request/response object model! I am happy to standardize on this style.

@andrross
Copy link
Member

@reta @samuel-oci The "plugin V2" term isn't really precisely defined and can have different meaning to different folks so I'm going to avoid using it (I definitely think reimplementing the whole plugin interface would be the wrong approach anyway...)

What @fddattal is describing here seems pretty reasonable to me. He wants to run the SQL plugin in some way where the index metadata comes from some place other than cluster state, so tightly coupling the SQL plugin code to call clusterService.state().metadata().index("foo").mapping() doesn't make it easy to inject custom logic. The fix for this is pretty straightforward from a Java perspective: introduce some generic interface that exposes a getIndexMappings(String indexName) method with the default implementation continuing to do clusterService.state().metadata().index("foo").mapping() but it allows custom environments to inject whatever implementation they want (note the scope of his problem is a little bit bigger than just index mappings, but the problem is generally the same and I think its helpful to highlight one specific use case for discussion).

The primary question as I see it is where should this interface live? It absolutely could be a library completely external to OpenSearch core that plugins could choose to use if they want and it could evolve independently from core. However, the SQL plugin likely isn't the only plugin that could benefit from reduced compile-time dependency on core details. I see this as a potential opportunity to add something like an "IndexMetadataProvider" interface in the plugin API that could replace plugins' tight coupling on the entire ClusterService public Java API, for example. So should this be a part of an effort by the core to present a more constrained API to plugins?

I'll also note that the general shape of the problem is the same that @dbwiddis is looking at, except in his case he's dealing with wanting a more abstract way to store plugin metadata that isn't tightly coupled to the concept of a system index or cluster state.

@sam-herman
Copy link
Contributor

What @fddattal is describing here seems pretty reasonable to me. He wants to run the SQL plugin in some way where the index metadata comes from some place other than cluster state, so tightly coupling the SQL plugin code to call clusterService.state().metadata().index("foo").mapping() doesn't make it easy to inject custom logic.

@andrross @fddattal Not sure I understand the "why", in what scenario this is helpful to not be able to access cluster state from a plugin? If you want to run a plugin not from within an OpenSearch node why not just leverage extensions?

@anastead
Copy link

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

@reta
Copy link
Collaborator

reta commented May 15, 2024

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

Thanks @anastead , I think we have 2 alternative routes to pursue, largely with following tradeoffs:

  • extension: do more on SQL plugin (or better to say, extension), less (if anything) on OpenSearch core side
  • plugin (continue to be): do less (if anything) on SQL plugin, do more on OpenSearch core side

@fddattal @dbwiddis (and I think you as well) lean towards the 2nd option: do less (if anything) on SQL plugin, do more on OpenSearch core side. To be fair, I am OK with the approach, but I am not satisfied with the APIs being suggested to become part of OpenSearch. The API is very much oriented at specific narrow problem at hand, or to say it in another words - bringing somethings you need right now and who knows what is coming next. The functionality of PluginMetadataClient is already provided (or could be provided if missed somehow) by Client / NodeClient in a more generic way. So I don't really see why we even need that.

@sam-herman
Copy link
Contributor

@samuel-oci why would we have to rewrite a core plugin to a new extensions interface when we just want to abstract some state namely cluster state? it seems like an overkill, am i missing something

@reta I do not understand why we would like to prevent access to cluster state from a plugin? unless we are not planning to run it from within OpenSearch node. (I think I am missing something? :) ). I am just trying to understand the benefits of it and understand the "why" part.
If we do not intend to run it from an OpenSearch node I am proposing extensions as the mechanism of choice.

bringing somethings you need right now and who knows what is coming next.

This is my feeling as well, but I try not to jump straight away into that conclusion, which is why I am trying to make sure I understand the use case.

@reta
Copy link
Collaborator

reta commented May 15, 2024

Thanks @samuel-oci

@reta I do not understand why we would like to prevent access to cluster state from a plugin?

We are not preventing that, I think the "cluster state" is somewhat misleading here, the plugin needs cluster wide data (source of truth) which is already exposed / could be accessed through different means.

@sam-herman
Copy link
Contributor

The particular problem that I have comes down to the fact that I deploy a custom distribution of OpenSearch. In my distribution, cluster based components don't exist. So, all code which depends on those direct cluster object do not work.

This sounds like a very specific problem in a private modified opensearch code running somewhere in @fddattal environment. What's the incentive to change public core in that case?

@andrross
Copy link
Member

What's the incentive to change public core in that case?

Great question! As @fddattal stated "This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface...". But I agree with @reta that the details matter here. If the proposed interface doesn't get us towards those goals then it probably should not go in core. For me, providing a way for plugins to get index metadata (a common use case) that doesn't require interacting with the entirety of ClusterService and ClusterState seems like a win for decoupling.

@sam-herman
Copy link
Contributor

This proposal aids in decoupling plugins from OpenSearch Core by restricting their core access through an abstract interface

I re-read the previous comments and I think I better understand the intention now. So let me try again :)
@fddattal are you trying to re-implement elements of core such as clusterState, and therefore you want to have a higher level interface that wraps on top of that?
Hypothetical example: if you want the entire cluster state to be stored in a MYSQL database and never loaded into memory or see such object ever again?

@shwetathareja
Copy link
Member

shwetathareja commented May 15, 2024

I want to add my 2 cents to the discussion:

Today, OpenSearch Cluster Manager has multiple responsibilities like metadata management, shard management, health checks, node membership, serving admin APIs along with the overall cluster management.

One of the goal which RFC #13197 aims to achieve is to refactor the ClusterManager in a way to separate out the metadata management (along with pluggable storage) from other aspects of the cluster as the first step. It could enable cloud native providers to simply build on top metadata management lib and may not need to set it up as a cluster infrastructure.

Lets take create-index API as an example:
MetadataCreateIndexService class has most of the logic around running validation, applying the template, submits a task in pending queue, implements the ClusterStateTaskExecutor which actually executes the logic to create process task (create-index) and creates a new cluster state object with updated metadata etc. Then, eventually wait for the state to be published to all the nodes, shards to be assigned and then send the response back.

Now, the intention is to separate out the validations, applying the template and generate the new index object (resource) as Metadata management, processing this resource to generate the new over all cluster state as cluster management aspect and finally the pluggable storage layer (local or remote). The Shard management can itself be separated out. But, for now lets keep it out of the discussion.

With this context, Plugins can have both read and write access to Cluster State. What @fddattal wants to separate out the read access to Cluster State or simply put the metadata access layer as the first step. So, this metadata access layer has to be inside core. Also, I don't think we should be doing un-necessary transformation by creating new Pojos. The access layer performance characteristics with in the cluster setup should be almost as same as access to ClusterService.state() object.
One of the issue that i see with NodeClient is the overhead of serde.

@reta
Copy link
Collaborator

reta commented May 15, 2024

Thanks @shwetathareja, we basically need to make Cluster Management core APIs fine grained (fe cluster view vs cluster management), right? I think this is would be useful and than we could may be introduce more lightweight ClusterViewPlugin (just a rough idea) for the plugins that do not need access to full fledged cluster management. Did I captured that correctly? Thank you.

One of the issue that i see with NodeClient is the overhead of serde.

It has executeLocally family of methods, no serde.

@shwetathareja
Copy link
Member

Thanks @reta yes, you captured the essence right 👍 more on the lines of Metadata.

@fddattal was exploring the performance characteristics of NodeClient. Yes, we can use executeLocally so the transport overhead goes away but lets take TransportGetIndexAction, when executed locally as well it will have the overhead of generating response object which would add certain overhead as opposed to directly accessing the object from state() object. Also, transport actions can trip circuit breakers which will not be desirable here.

@reta
Copy link
Collaborator

reta commented May 15, 2024

Thanks @reta yes, you captured the essence right 👍 more on the lines of Metadata.

Got it, thanks. So I think we should take a step back and focus on Cluster Management APIs first, that would help us to design the plugin related abstractions after.

@fddattal
Copy link
Author

I've created a draft PR containing the code for this proposal. Please take a look! #13712

@sam-herman
Copy link
Contributor

We are currently trying to finalize the new feature process so the process is not fully in place yet. But a few notes on this discussion so far:

  1. Problem statement - I would like to see a much clearer problem statement to consider the RFC as concluded. Terminology such as "better decoupling" and "better cloud native" with no concrete examples are quite opaque and abstract and do not provide sufficient information about a problem being addressed.
  2. Design - Next step I would like to see a detailed design document that explains how we avoid backward compatibility issues for existing plugins and how that ties to the holistic methodologies of extending the architecture with initiatives such as Extensions and external writer

@fddattal
Copy link
Author

Hey folks!

I want to let you all know that I've decided to pursue a path which does not require modifications to OpenSearch core.

I will be vending a custom org.opensearch.Client in my environment which the SQL plugin will use to get what it needs from core. I'm also refactoring the SQL plugin such that all core access is funneled through the actions on the org.opensearch.Client interface - see opensearch-project/sql#2696.

Thanks you for all of your inputs! It's great to see such a lively discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request extensions Plugins RFC Issues requesting major changes Roadmap:Modular Architecture Project-wide roadmap label
Projects
Status: New
Development

No branches or pull requests