Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdata] Find replacement for now deprecated gogo/protobuf #7095

Open
yurishkuro opened this issue Feb 2, 2023 · 10 comments
Open

[pdata] Find replacement for now deprecated gogo/protobuf #7095

yurishkuro opened this issue Feb 2, 2023 · 10 comments
Labels
area:pdata pdata module related issues dependencies Pull requests that update a dependency file priority:p1 High

Comments

@yurishkuro
Copy link
Member

I don't know the full background of using gogo/protobuf, I suspect it was similar to Jaeger's use of gogoproto for serialization efficiency (in particular, the ability to embed structs by value, as in []KeyValue instead of []*KeyValue, which significantly reduces memory allocations).

Well, the bad news, which was long coming, is that GoGo is officially deprecated since Oct'22. Continue to depend on it going forward presents security and compatibility risks.

This ticket is to kick off a discussion for a viable path forward, since I don't have a solution in mind right now.

Some past related tickets:

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Feb 2, 2023

Yes, Collector uses Gogoproto instead of Google's implementation for performance reasons.

Here are a few replacement options that I can think of:

  • Move to Google Protobuf library. This will most likely result in worse performance for the the Collector (about 1.3x worse in our past benchmarks IIRC). The newer version of Google Protobuf (v2?) is even slower than the older version.
  • Otel to take responsibility of entire Gogoproto. This is a big undertaking, I don't think we want to subscribe to that.
  • Fork Gogoproto and maintain an Otel fork just for ourselves. It is still a pretty large codebase to maintain and my cursory assessment of it was that it was not a codebase that I would want to take responsibility to maintain myself.
  • Create our own Protobuf library. This may sound crazy but may be tractable if we limit it to only support what OTLP needs. I managed to write an experimental compiler/generator in just a few days that works on OTLP Logs and may be faster than Gogoproto (and with optional fields support, BTW). Obviously a production quality library requires much bigger investment and I definitely won't claim that I alone want to be a sole author and maintainer of such a library.

Unfortunately I do not see great options here, all have some downsides. We will need to decide how much effort we are willing to invest into this.

There is probably also an option to find some other alternate Protobuf library but I have not explored this.

@yurishkuro yurishkuro transferred this issue from open-telemetry/opentelemetry-proto Feb 2, 2023
@yurishkuro
Copy link
Member Author

As far as replacements, there are other projects that seem to provide similar capabilities to what Gogo had:

At minimum, they seem to be maintained at the moment, and hopefully are more narrow in scope than gogo.

@jpkrohling
Copy link
Member

What is Jaeger using nowadays? Any experience you can share with us?

@yurishkuro
Copy link
Member Author

Jaeger hasn't changed from gogo/proto

@codeboten
Copy link
Contributor

I've not tried using csproto, but there were a few investigations I'd done in the past including trying to use vtproto with the standard google protobuf library here: #4658

It's probably worth another go at it now that it's been over a year.

@yurishkuro
Copy link
Member Author

@codeboten I saw that PR, but can you share some findings / conclusions? It was closed without any explanation, so not clear if you ran into a dead end with something.

@codeboten
Copy link
Contributor

codeboten commented Feb 7, 2023

The scope of the work then was to add support in the collector for optional fields which was achieved through a workaround (#4258 (comment)). That was the reason this PR was closed.

From what I recall the performance numbers were not encouraging at first. Additionally I ran into issues w/ lack of support foroneof fields in pools.

@mx-psi mx-psi added dependencies Pull requests that update a dependency file priority:p1 High area:pdata pdata module related issues labels Apr 13, 2023
mx-psi added a commit that referenced this issue Dec 11, 2023
**Description:** Reorganize `VERSIONING.md`, specify compatibility
guarantees that have already been discussed and answer questions on
#8002.

Three new guarantees in the document had already been discussed
elsewhere:

- String representation:
#7625 (comment)
- Go version compatibility: Mentioned [on
README.md](https://github.com/open-telemetry/opentelemetry-collector?tab=readme-ov-file#compatibility)
- Dependency updates: Discussed in private, most recently in relation to
whether #7095 should block `pdata`'s 1.0 (consensus seems to be that it
should not).

Regarding the questions mentioned on #8002:

> Does adding new validation rules mean a breaking change?

Generally, yes except when the configuration was already invalid (i.e.
the Collector did not work properly with it).

> Should we offer a "NewDefault...." for each config and say that the
behavior of the config is stable only if initialized with NewDefault?

I did not add anything for this one. I think we should discuss it, but:
- I don't think there is a sensible output for `NewDefault...` for all
configurations (e.g. for `confignet.TCPAddr`, what would the default
be?)
- It seems to me that we should handle configuration validity through
`Validate`, and not through `NewDefault...`. `NewDefault` may help but
ultimately we need to verify through `Validate`.

> Is adding new fields means breaking change? Let's assume someone
"squash" the message into another message, then adding a field with same
mapstruct value will be a breaking change, what do we do with that? What
are the rules we follow.

This was already in this document, where we had decided that adding new
fields was okay. I think it would be too stringent to bar us from adding
new fields.

**Link to tracking Issue:** Fixes #8002

---------

Co-authored-by: Yang Song <[email protected]>
@yurishkuro
Copy link
Member Author

yurishkuro commented Aug 29, 2024

This may be time to revisit this issue. I was dealing with grpc upgrade to v1.66.0 in Jaeger where gRPC introduced CodecV2 and memory buffers that broke out integration with gogoproto. I was able to fix it by forcing the new buffers back to []byte, which is probably going to degrate the efficiency of marshaling, thus partially negating the benefits of gogoproto. And I also noticed new dependency on https://github.com/planetscale/vtprotobuf coming from grpc via envoyproxy/go-control-plane, which led me to this issue envoyproxy/go-control-plane#824 showing large perf improvements by switching to vtprotobuf (which it calls "it is gogo protobuf, but built on top of protobuf v2, instead of replacing it.")

@yurishkuro
Copy link
Member Author

https://go.dev/blog/protobuf-opaque looks very promising as a path towards removing gogo/protobuf

@tigrannajaryan
Copy link
Member

https://go.dev/blog/protobuf-opaque looks very promising as a path towards removing gogo/protobuf

Yes, let's see if performance is on par.

Also, yay for lazy decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:pdata pdata module related issues dependencies Pull requests that update a dependency file priority:p1 High
Projects
None yet
Development

No branches or pull requests

5 participants