Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support binary serialization of ProtoBuf schemas #755

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

tvainika
Copy link
Contributor

@tvainika tvainika commented Nov 3, 2023

About this change - What it does

Add support for base64 encoded binary encodings of ProtoBuf schemas when registering new schemas and with format=serialized query parameter when fetching the schemas. Fixes #742 #398

Why this way

Current implementation uses protobuf python package for binary serialization. On top of that it needs both ways converting between internal protobuf schema presentation and protobuf library definition of types.

@tvainika tvainika force-pushed the protobuf-binary-format branch 2 times, most recently from 369698e to 8d6d7d1 Compare November 6, 2023 14:50
@tvainika tvainika marked this pull request as ready for review November 6, 2023 15:04
@tvainika tvainika requested review from a team as code owners November 6, 2023 15:04
@tvainika tvainika force-pushed the protobuf-binary-format branch from 8d6d7d1 to 2f40f12 Compare November 9, 2023 10:20
Copy link
Contributor

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some missing handling of elements in the serialization:

  • ExtendElement not supported.
  • ExtensionsElement not supported, this is mostly proto2 but can be used in proto3 for declaring custom options.
  • GroupElement not supported, for proto2.
  • ReservedElement not supported.

requirements/requirements-typing.in Outdated Show resolved Hide resolved
karapace/protobuf/schema.py Outdated Show resolved Hide resolved
d.nested_type.append(_serialize_type(nt))
elif isinstance(nt, EnumElement):
d.enum_type.append(_serialize_enumtype(nt))
if isinstance(t, MessageElement):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the else be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added else branch throwing an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood your point wrongly first. Now fixed by renaming _serialize_type -> _serialize_msgtype and only takes MessageElement

Copy link
Contributor

@eliax1996 eliax1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take other open repos containing the serialized format to do kinda like an integration test? I'm wondering if we can expand the testing coverage in an un-expensive way to automatically catch missing part. For the protobuf @jjaakola-aiven have parsed in an automatic way a huge amount of protobuf file. In that case maybe we can do the same by comparing an existing and working serializer with your

karapace/protobuf/location.py Show resolved Hide resolved
karapace/protobuf/schema.py Outdated Show resolved Hide resolved
karapace/protobuf/schema.py Outdated Show resolved Hide resolved
karapace/protobuf/serialization.py Outdated Show resolved Hide resolved
karapace/protobuf/serialization.py Outdated Show resolved Hide resolved
@tvainika tvainika force-pushed the protobuf-binary-format branch from 2f40f12 to 9ffa7e9 Compare November 13, 2023 12:55
@tvainika
Copy link
Contributor Author

Added support for ReservedElement but for other mentioned: those don't seem to be supported by CSR even, so I think it is safe to ignore those.

Somehow reusing open repository of different schemas would be nice, but is there any repository that would have also base64 encoded binary schemas in parallel readymade? If not, then testing against such requires somewhat manual effort to verify multiple things for each schema.

eliax1996
eliax1996 previously approved these changes Nov 13, 2023
Copy link
Contributor

@eliax1996 eliax1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me looks fine, waiting @jjaakola-aiven feedback before merging

Add support for base64 encoded binary encodings of ProtoBuf schemas
when registering new schemas and with format=serialized query
parameter when fetching the schemas. Fixes #742
@jjaakola-aiven jjaakola-aiven merged commit 327407d into main Nov 14, 2023
8 checks passed
@jjaakola-aiven jjaakola-aiven deleted the protobuf-binary-format branch November 14, 2023 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confluent's ProtobufSerializer fails to register a schema
3 participants