Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Avro encoding approach #40

Closed
dalelane opened this issue Dec 18, 2020 · 18 comments
Closed

Specify Avro encoding approach #40

dalelane opened this issue Dec 18, 2020 · 18 comments
Labels

Comments

@dalelane
Copy link
Collaborator

There is a need to describe the Avro encoding when using Apache Avro as the payload schema format.

For context, when using an Avro schema to serialize data, a Kafka producer can choose between two ways of encoding the data:

  • json - the data is still human-readable which is sometimes useful
  • binary - the encoded data is smaller and can be faster to deserialize

A Kafka consumer needs to know which encoding mechanism was used to be able to deserialize the messages.

One of the most commonly used Kafka serializer libraries, Confluent's io.confluent.kafka.serializers.KafkaAvroSerializer, only supports binary encoding, so I think it's reasonable to assume binary as a default where not specified, and not try to treat this as a required field.

But it should be possible to specify when json encoding is being used with other serdes libraries such as Apicurio's

(related Slack question)

Fran's suggestion in Slack was that capturing this in the operation binding would allow this to be specified at the topic level.

@dalelane dalelane added the enhancement New feature or request label Dec 18, 2020
@github-actions
Copy link

Welcome to AsyncAPI. Thanks a lot for reporting your first issue.

Keep in mind there are also other channels you can use to interact with AsyncAPI community. For more details check out this issue.

@dalelane
Copy link
Collaborator Author

I had a go at illustrating what this would look like. It got a bit long, so instead of putting it in a super-long comment here, I've put it in a stand-alone gist at https://gist.github.com/dalelane/3931c17b14c51fa4a1cf25496237d188

(I also addressed the related issue #41 as part of the same exploration)

dalelane added a commit to dalelane/bindings that referenced this issue Feb 23, 2021
Adding enough information about how schemas are being
used in Kafka messages to enable an application that
consumes messages serialized using schemas to be
implemented purely from an AsyncAPI spec.

Also introduces two new common, reusable definitions
that can be used as message traits, to make it easier
to quickly specify headers used to capture schema
details.

Contributes to: asyncapi#40
Contributes to: asyncapi#41

Signed-off-by: Dale Lane <[email protected]>
dalelane added a commit to dalelane/bindings that referenced this issue Feb 25, 2021
Moving the message traits definitions into an examples folder to
make it clear that they are not part of the spec or bindings.

Contributes to: asyncapi#40

Signed-off-by: Dale Lane <[email protected]>
dalelane added a commit to dalelane/bindings that referenced this issue Mar 3, 2021
There is some debate about the best place to host and manage these,
and as these aren't directly required for this PR I'd like to move
them out so that they're not a blocker to this being merged.

Contributes to: asyncapi#40

Signed-off-by: Dale Lane <[email protected]>
dalelane added a commit to dalelane/bindings that referenced this issue Mar 3, 2021
It's easier to add fields in a future PR than to remove them, so
I'm erring on the cautious side for now and removing this (as I
think there is some duplication and potential for conflict with
useSchemaRegistry)

Contributes to: asyncapi#40

Signed-off-by: Dale Lane <[email protected]>
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴
It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation.
Thank you for your contributions ❤️

@github-actions github-actions bot added the stale label Apr 17, 2021
@derberg derberg removed enhancement New feature or request stale labels Jun 17, 2021
@derberg derberg reopened this Jun 17, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴
It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation.
Thank you for your contributions ❤️

@github-actions github-actions bot added the stale label Aug 17, 2021
@derberg
Copy link
Member

derberg commented Aug 17, 2021

@dalelane is it Kafka specific only? why not on schemaFromat level?

@dalelane
Copy link
Collaborator Author

Yeah, that's a fair point. I've only heard of this being an issue in Kafka, but in theory there's no technical reason why it only could apply to Kafka

@derberg
Copy link
Member

derberg commented Aug 17, 2021

at the end you just need to specify in the document what content type is sent over the wire right? like avro/binary and the schema format would still be json or yml, or?

@dalelane
Copy link
Collaborator Author

@derberg When you're using a schema registry (which is the most common approach when Kafka developers are using Avro schemas), developers will specify the ID and version for the schema they're using. There are lots of different places that this can be specified.

It could be in a header. Or it could be in the message body - typically this is done by prefixing the rest of the message body with it, as some number of bytes are allocated before the rest of the message contents for ID and version. But there are different conventions in the Kafka ecosystem for how many bytes to use to do this.

The challenge here is that you don't know how to parse/deserialise the message body without knowing the convention that was used.

e.g. If the message publisher put the schema ID and version in the message body, and used two bytes to do this,
then the message subscriber needs to know to skip the first two bytes of the message body and then use Avro to deserialize only the data after that. (Even if you already have the schema, provided through AsyncAPI, to be able to do the deserialize at all you will still at least need to know to swallow the first couple of bytes, as they're not part of the Avro-serialized data)

@derberg
Copy link
Member

derberg commented Aug 17, 2021

It does sound tricky, would love to see this on the code level, might be useful to figure out how to present it then in the AsyncAPI file to generate a given code.

Did you explore the Java template we have (not Spring one). It needs some love and we are looking for maintainers there 😆

@dalelane
Copy link
Collaborator Author

dalelane commented Sep 2, 2021

@derberg Where is the Java template, please? I was only aware of the two Spring generators (Java Spring and Java Spring Cloud Stream)

@derberg
Copy link
Member

derberg commented Sep 2, 2021

oh, sorry for confusion, by not Spring one I meant not Java Spring Cloud Stream

@dalelane
Copy link
Collaborator Author

dalelane commented Sep 2, 2021

@derberg Brill, thanks. I had a go at adding support for the new security scheme types as a way of familiarising myself with the generator

@github-actions github-actions bot removed the stale label Oct 5, 2021
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Feb 2, 2022
@derberg derberg removed the stale label Feb 3, 2022
@github-actions
Copy link

github-actions bot commented Jun 4, 2022

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Jun 4, 2022
@derberg
Copy link
Member

derberg commented Jun 7, 2022

@dalelane are you still working on this one?

@github-actions github-actions bot removed the stale label Jun 8, 2022
@dalelane
Copy link
Collaborator Author

This will be addressed by #115

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Jan 23, 2023
@dalelane
Copy link
Collaborator Author

closing as I think this has been covered by the changes in #115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants