-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Protobuf in OpenSearch #6844
Comments
Super excited about this!!! It's non-invasive which is fantastic! Enables us to further refactor the transport layer for extensions to support this. This has my full endorsement! Nice work! |
@saratvemulapalli thanks for the RFC, how beneficial is gRPC for extensions actually? I am not against it but I have doubts we have a compelling use case now for it:
The unknown part for me is the role of |
Assuming extensions get very chatty transport improvements will be increasingly useful, but I am reading this proposal as 1) make transport pluggable today in OpenSearch, and 2) implement gRPC as an option. I think this has the potential to significantly improve node-to-node communication today passing around all kinds of cluster state. Let's see by how much! |
thanks @reta for taking a look.
I dont know how much gRPC will help extensions (yet, we'll learn more) but with protobuf I've listed down the benefits we see so far for extensions. Infact extensions will be similar to clients for Rest APIs as they use clients internally.
We definitely don't want to force using gRPC, as an example with clients. It should be just another option which we will offer in addition to Json based Rest APIs. Mostly with the RFC, I am looking for feedback to see if there is something fundamental I am missing vs give it a shot and lets get some numbers to decide if this worth chasing in the longer term. |
Transport is already pluggable today (e.g., plugins/transport-nio)
Sort of. We would implement protobuf as an option (just like CBOR, SMILE, YAML, and JSON). The only difference here is that we would add an additional This is why I've been pushing the XContent refactor. The next PR will refactor the StreamInput/StreamOutput classes into a library and the xcontent/protobuf implementation can provide its concrete implementation that uses its own serialize logic. It's an elegant approach that gets us even further toward modularizing massive amounts of code out of
Exactly. Let's keep this simple and focus just on protobuf as an optional transport (that as a bonus supports versioning!) than expanding the aperture to say it's a silver bullet for extensions (progress not perfection). In a followup (after vetting and going GA) we can discuss protobuf as the default transport format over JSON. I think it's clearly a direction worth considering. |
Thanks a lot @saratvemulapalli , I was a bit confused by gRPC mentions but @nknize clearly clarified that we are only talking about serialization mechanism (protobufs), and not the communication protocol change (gRPC, at least for now). Thanks. |
Thanks @reta. I probably took you off mentioning about gRPC (sorry about that) but mostly intended this issue for Protobuf and put in list of opportunities this will enable and one of them is gRPC (if choose to make it happen down the line). I've updated the RFC to clarify this. Tagging @peternied @cwperks @wbeckler @seanneumann who had thoughts/feedback. |
I'm not quite as up-to-speed on gRPC and other options as the more experienced folk up above, but I will say I'm all for protobuf because:
|
Great proposal, I'm all for starting as an optional transport layer and we can figure out where it best benefits the project. |
All in for protobuf! Great proposal @saratvemulapalli! In the future it would be great to see JSON response with versioning support for extension points APIs using protobuf (No more of XContent!). |
Thanks for the proposal. |
@Bukhtawar I don't recall there was any evaluation being done |
Another one that I guess should be considered in favour of low garbage https://github.com/OpenHFT/Chronicle-Wire |
Thanks @Bukhtawar for the feedback. [1] https://protobuf.dev/programming-guides/techniques/#large-data |
The ones I have seen quite often in place of protobuf, if it could help, are:
|
I did a POC (see draft PR #9097 for the POC code changes for changing the API request response de/serialization and between the nodes) with protobuf for 2 nodes cluster Average improvement of 16.14% 5 nodes cluster Average improvement of 18.11% 10 nodes cluster Average improvement of 21.35% 15 nodes cluster Average improvement of 31.39% |
This is significant. Can we ship this? |
Also another great data point from @dbwiddis and @dblock, while they were trying to write https://github.com/opensearch-project/opensearch-sdk-py, all extension interfaces were autogenerated in python and seamlessly could work over transport with OpenSearch de-serializing this data using Protobuf java. |
Thank you @dblock! I am working on getting the changes from the POC in #9097 to merge in the repo. @saratvemulapalli and I are also working on getting the numbers for APIs like search. |
@VachaShah the numbers are very convincing, one thing we should keep in mind (which you definitely know about) is how to support migration from 2.x to 3.x when some nodes will talk old protocol and new ones will use Protobuf (or in general, any new protocol). In continuation to this, it may not be feasible to migrate all transport actions to Protobuf in one shot so even in 3.x we would need to maintain this mix of transport protocols. |
As a strategy, I would 1) support multiple protocols in 2.x in a way where we can migrate actions one-by-one, 2) rip out the existing transport protocol implementation in 3.0 and fully replace it with Protobuf. IMO, we only need an upgrade path in which a 2.x node can do just enough transport protocol to upgrade itself to protobuf and then never look back. |
@reta we might have a way to get inter-operable protocols as protobuf can write and readfrom bytesArray but it be could lots of manual effort to get all actions into the new protocol :/. |
It would be worth to see what actions can benefit the most from protobuf and as Sarat mentioned, the 2 protocols can co-exist with each other with some effort so we can make the upgrade scenarios work. |
Is the current effort on this RFC being done on some feature branch that I can follow and take a look at? |
Hi @austintlee, currently there is a draft PR #9097 with the POC and the changes from the draft PR would be PRed out incrementally. |
Great proposal! Do we know what's the reason behind this improvement? Is it due to reduced CPU time during serialization, or reduced network time due to difference in payload size? |
Is there already commitment to protobuf without considering alternatives per @reta?
Avro/Thrift have been around for quite some time. I'd like to see some comparison of some of these. |
@macohen I think the idea is that we 1) make the transport protocol pluggable, 2) ship protobuf support as an experimental feature with an upgrade path, 3) offer other protocols, 4) make the best one the default. |
We believe most of the benefits are with data compression, we couldn't really analyze frame graphs due to multiple threads. |
@saratvemulapalli @VachaShah can you please confirm if this change can be included in 2.x without breaking existing API? Basically can this change be added in a backward compatible manner in 2.x line? We are evaluating if this change requires 3.0 release or can be included in 2.x line so need your inputs. |
@bbarani we do not know yet. Once the changes are pushed to 3.x for experimental feature, we will look at backward compatibility. In theory we do see ways where native transport protocol can move to protobuf, but there are few unknowns with mixed cluster scenarios which we have to dive into. For now I would rather say it would be a breaking change. |
this sounds very interesting! what i didn't quite gather from this issue is whether you plan on using this only on the transport layer or whether you also see this as an option for the public API in the future (e.g. i'm not sure whether #10684 is only for the distribution of the search requests to the nodes or also for consumers which want to trigger the search request)? especially if this is then hidden behind clients like (just talking as a nerd here, i currently don't have performance problems caused by the REST/JSON overhead) |
Inspiration
Plugins are very tightly version coupled with OpenSearch #1707 and relaxing them to work for patch versions is still in the works.
While working on extensions #2447 we really wanted to support multiple versions (including major/minor/patch) of OpenSearch with one OpenSearch SDK[1].
Proposal
Exploring for opensource solutions, Protobuf[2] which is built by Google and is widely adopted for serializing/de-serializing and used as RPC. It was built out of the box to support forward and backward compatibility seamlessly.
With an initial experiment of integrating protobuf in OpenSearch/Extensions opensearch-project/opensearch-sdk-java#414 (comment), we see:
.proto
definitions.For extensions, protobuf solves a lot of problems but has a tiny overhead for serialization/de-serialization over existing OpenSearch's
StreamInput
StreamOutput
Next Steps
With the learnings we have seen in SDK/Extensions, there is more potential for Protobuf integration in OpenSearch and would like to propose offering Protobuf as a new type:
StreamInput
,StreamOutput
with protobuf serializer/de-serializers. This will help offer another type within the transport ecosystem similar toByteBufferStreamInput
[3] etc.This would seamlessly plugin into
Writable
[4] interface which is used across the repo for transporting custom messages.Adding in transport will enable communication between OpenSearch nodes to have significant benefits in performance and seamless versioning compatibility.
@nknize already started making changes to enable this with restructuring
XContent
#6470XContent.Type
to add protobuf as an option. Historically converting Json <-> Protobuf has performance implications but for transporting on the Rest Layer with clients, OpenSearch Dashboards and ingestion tools might have benefit when talking over binary format. (Yet to experiment)Additionally, having protobuf at Rest layer will unblock OpenSearch to support gRPC (if we choose this path).
FAQ
Q Is Protobuf higher performant?
A. We moved 2 APIs a. Cat Nodes b. _search, both APIs with protobuf had atleast 20% better performance compared to native protocol, and we see linear improvements with increase in cluster size.
Q. What are the benchmark numbers for search ?
A. See OpenSearch benchmark results for querying with Protobuf : #10684 (comment)
Q. What are the benchmark numbers for Cat Nodes (Operational APIs)
A. See benchmarking results : #6844 (comment)
Q. Is Protobuf in OpenSearch necessary to support GRPC
A. Protobuf works at transport layer, while GRPC is a layer 7 protocol. GRPC internally uses protobuf as transport which makes it a dependency. We presume there will be significant performance benefits with GRPC as data would be transmitted binary instead of JSON.
cc: @VachaShah @prudhvigodithi
[1] https://github.com/opensearch-project/opensearch-sdk-java
[2] https://protobuf.dev/overview/
[3] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/io/stream/ByteBufferStreamInput.java
[4] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/io/stream/Writeable.java
The text was updated successfully, but these errors were encountered: