Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] gRPC-based API for Search #15190

Closed
amberzsy opened this issue Aug 9, 2024 · 11 comments
Closed

[RFC] gRPC-based API for Search #15190

amberzsy opened this issue Aug 9, 2024 · 11 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Search Project-wide roadmap label Search:Performance Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0

Comments

@amberzsy
Copy link

amberzsy commented Aug 9, 2024

Is your feature request related to a problem? Please describe

Inspiration

Per effort of #6844 and benchmarking result (#10684 (comment)) (~20%), we can consider step further on adding support on gRPC-based API with protobuf as serializing/de-serializing. To validate our assumption on potential performance gain over protobuf which should be more efficient and compact compare to JSON, we performed PoC for client <> server protobuf on Search API with specific query types and we are able to see promising result from opensearch-project/opensearch-clients#69.

Proposal

With ongoing effort for node-to-node communication, which focuses more on Transport Layer with implementing StreamInput, StreamOutput with protobuf serializer/de-serializers. We can expand the effort and have client <> server protobuf support in parallel to achieve more significant performance gain.

The proto definition for search API and partial overlap with transport layer should follow opensearch-api-specification which is widely adopted by clients.

For server side change there are two options here:

  1. Introduce new content-type and expose option to end-user send and receive protobuf binary payloads.
    Pros: faster development cycle to begin with as potentially the extension on existing searchRequest/Response, builder
    XContent.
    Cons: potentially introduce significant code refactoring which introduces complexity alongside the development.

  2. Implement new streaming-style search API(gRPC) using protobuf and expose new grpc endpoint for search API.
    Pros:
    a) gRPC natively supports client-side, server-side, and bidirectional streaming, allowing for real-time
    communication. This is more efficient than HTTP/1.1 used by REST
    b) generates client and server code in multiple programming languages based on the proto files. This reduces
    boilerplate code and ensures consistency across different languages and platforms.
    c) less code refactoring
    Cons:
    a) the development cycle might not as fast as approach 1.
    b) Though bringing up new grpc service and hook with the internal transport layer might not be too complicated,
    there will be unknowns on the overall integration with existing ecosystem, e.g related plugins (security, knn,
    sql, some other monitoring etc).

For client (Java, Go, Python etc), would have support to optionally use new protobuf-based server API with minimal changes (i.e. no need to rewrite an application already using the client)

Next Steps

  1. Generate proto from opensearch-api-specification (refer: https://github.com/nytimes/openapi2proto)
  2. bootstrap / create gRPC SearchService (SearchGRPCService) and hook with internal layer (clusterservice, actionlisterner etc)
  3. grpcHandlers for searchAction: add grpc/action/search and register in ActionModule
  4. There are ~ 40+ queryBuilder/types, need to target on knn related as . (? CorrelationQuery)
  5. ?? integrate with transport layer protobuf implementation (node-to-node)

Timeline

2.17 release: (09/03/2024 ~ 09/17/2024)
[Experimental Feature]

  1. protobuf definitions
  2. simple matchAll query for E2E poc.
  3. feature will be marked as experiment.

Related

Transport layer Protobuf support: #6844

@amberzsy amberzsy added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 9, 2024
@github-project-automation github-project-automation bot moved this to Issues and PR's in OpenSearch Roadmap Aug 9, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Aug 9, 2024
@Pallavi-AWS Pallavi-AWS added the Roadmap:Search Project-wide roadmap label label Aug 9, 2024
@getsaurabh02 getsaurabh02 added the RFC Issues requesting major changes label Aug 9, 2024
@getsaurabh02
Copy link
Member

Thanks @amberzsy for the proposal. Should we also highlight the abstraction the new 'gRPC SearchService' under an 'Experimental Flag' for the proposed timeline of this feature?

@getsaurabh02 getsaurabh02 added Roadmap:Search Project-wide roadmap label and removed Roadmap:Search Project-wide roadmap label labels Aug 9, 2024
@dblock
Copy link
Member

dblock commented Aug 12, 2024

For client (Java, Go, Python etc), would have support to optionally use new protobuf-based server API with minimal changes (i.e. no need to rewrite an application already using the client)

I really like this. Do I understand correctly that the stated goal of this implementation is that a user can switch from REST/HTTP/application/(nd)json to HTTP2/grpc/protobuf via a configuration option on the client (and then it just works(TM) for all APIs)?

@amberzsy
Copy link
Author

For client (Java, Go, Python etc), would have support to optionally use new protobuf-based server API with minimal changes (i.e. no need to rewrite an application already using the client)

I really like this. Do I understand correctly that the stated goal of this implementation is that a user can switch from REST/HTTP/application/(nd)json to HTTP2/grpc/protobuf via a configuration option on the client (and then it just works(TM) for all APIs)?

correct. some lightweight translator/adaptor would be needed.

@reta
Copy link
Collaborator

reta commented Aug 16, 2024

@amberzsy @dblock I have two questions please:

  1. 3.x comes with HTTP/2 support (clients + server) out of the box, what are the tangible benefits of using gRPC here?
  2. 2.x does not support HTTP/2 (server side) nor have any client libraries that could handle that (AHC 4.x does not support HTTP/2), what is our plan here?

@dblock
Copy link
Member

dblock commented Aug 16, 2024

Re: benefits I expect grpc + protobuf to improve both performance and throughput over HTTP/2 JSON. You're right to call this out though, @amberzsy were your benchmarks using HTTP/2?

@andrross
Copy link
Member

@dblock The previous benchmarks for the REST API were just sending binary protobuf blobs over the HTTP/1.1 protocol. It essentially showed that parsing protobuf was more performant than XContent parsing JSON (no surprise there). I expect any solution that is able to replace XContent parsing with protobuf to show performance improvements. I don't know if gRPC would show better performance when compared to any other HTTP/2-based solution that sent protobuf blobs but I think it is worth experimenting with some prototypes.

@reta
Copy link
Collaborator

reta commented Aug 16, 2024

I don't know if gRPC would show better performance when compared to any other HTTP/2-based solution that sent protobuf blobs but I think it is worth experimenting with some prototypes.

Thanks @andrross , this is exactly what we need to figure out: tangible benefits of using gRPC vs HTTP/2 + JSON (since this RFC specifically focuses on gRPC and not HTTP/1.1 + Protobuf). Thank you.

@amberzsy
Copy link
Author

amberzsy commented Aug 16, 2024

e: benefits I expect grpc + protobuf to improve both performance and throughput over HTTP/2 JSON. You're right to call this out though, @amberzsy were your benchmarks using HTTP/2?

with HTTP/1.

@amberzsy @dblock I have two questions please:

  1. 3.x comes with HTTP/2 support (clients + server) out of the box, what are the tangible benefits of using gRPC here?
  2. 2.x does not support HTTP/2 (server side) nor have any client libraries that could handle that (AHC 4.x does not support HTTP/2), what is our plan here?

gRPC uses http/2 as it's transfer protocol plus it has build-in protobuf support as its default serialization format.
From the benchmark of both client-server and node-to-node, we've seen perf gain on adopting protobuf and replacing xContent parser logic. i think with HTTP/2 (http/2 + json) alone might not achieve similar improvement. possibly with http/2 + proto, though not sure if it's commonly adopted. Since i guess we need to manually handle the serialization, deserialization, and method invocation across different languages, which adds complexity.
Beyond, gRPC also provide abstraction and simplified development which it abstracts the underlying communication details and client generated from grpc for free in multiple programming languages, which reduce the boilerplate. I guess we might need to manually write and maintain such in Http/2. It also provides other benefits in terms of built-in support for features like load balancing, distributed tracing, and authentication.

@prudhvigodithi
Copy link
Member

Thanks for the proposal @amberzsy, just went through some of the OpenSearch issue links that talks about Protobuf implementation.
#6844
#15190

Regarding this RFC proposal, the input and output will be in Protobuf binary format (including the streaming-style search API with a gRPC endpoint). For OpenSearch users, to ensure that the API behavior remains unchanged, is there a plan to implement a generic interface that converts Protobuf messages back to a JSON-friendly format for output? Additionally, could this interface be used to read user input as JSON and convert it back to Protocol Buffers?
Thank you

@andrross
Copy link
Member

@prudhvigodithi

For OpenSearch users, to ensure that the API behavior remains unchanged, is there a plan to implement a generic interface that converts Protobuf messages back to a JSON-friendly format for output?

There are no plans to remove the existing JSON APIs.

@amberzsy
Copy link
Author

amberzsy commented Dec 5, 2024

close the issue and move to execution and implementation details listed in #16787.

@amberzsy amberzsy closed this as completed Dec 5, 2024
@github-project-automation github-project-automation bot moved this from Now(This Quarter) to ✅ Done in Search Project Board Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Roadmap:Search Project-wide roadmap label Search:Performance Search Search query, autocomplete ...etc v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: New
Archived in project
Development

No branches or pull requests

8 participants