docs: update Kafka bindings to describe schema usage #55

dalelane · 2021-02-23T14:25:58Z

Adding enough information about how schemas are being
used in Kafka messages to enable an application that
consumes messages serialized using schemas to be
implemented purely from an AsyncAPI spec.

Also introduces two new common, reusable definitions
that can be used as message traits, to make it easier
to quickly specify headers used to capture schema
details.

Contributes to: #40
Contributes to: #41

Signed-off-by: Dale Lane [email protected]

Adding enough information about how schemas are being used in Kafka messages to enable an application that consumes messages serialized using schemas to be implemented purely from an AsyncAPI spec. Also introduces two new common, reusable definitions that can be used as message traits, to make it easier to quickly specify headers used to capture schema details. Contributes to: asyncapi#40 Contributes to: asyncapi#41 Signed-off-by: Dale Lane <[email protected]>

github-actions

Welcome to AsyncAPI. Thanks a lot for creating your first pull request. Please check out our contributors guide and the instructions about a basic recommended setup useful for opening a pull request.

Keep in mind there are also other channels you can use to interact with AsyncAPI community. For more details check out this issue.

dalelane · 2021-02-23T14:26:57Z

I've put an illustration of how this could be used in https://gist.github.com/dalelane/3931c17b14c51fa4a1cf25496237d188

kafka/traits/apicurio-v0.0.1.yaml

lbroudoux

Thanks @dalelane for great work. In my opinion, there's still the need for useRegistry flag dissociated from servers.

kafka/README.md

Moving the message traits definitions into an examples folder to make it clear that they are not part of the spec or bindings. Contributes to: asyncapi#40 Signed-off-by: Dale Lane <[email protected]>

Signed-off-by: Dale Lane <[email protected]>

There is some debate about the best place to host and manage these, and as these aren't directly required for this PR I'd like to move them out so that they're not a blocker to this being merged. Contributes to: asyncapi#40 Signed-off-by: Dale Lane <[email protected]>

It's easier to add fields in a future PR than to remove them, so I'm erring on the cautious side for now and removing this (as I think there is some duplication and potential for conflict with useSchemaRegistry) Contributes to: asyncapi#40 Signed-off-by: Dale Lane <[email protected]>

kafka/README.md

Also sorted the list alphabetically to avoid indicating any other significance to the ordering. Signed-off-by: Dale Lane <[email protected]>

derberg · 2021-04-16T06:22:48Z

where are we with this PR? looks good to me

M3lkior · 2021-04-19T16:33:02Z

it is a good start ! some users (like me) needs to provide extra informations about topics naming strategy like this (confluent specific)

x-key.subject.name.strategy:
           type: string
           description: |
             We use the **RecordNameStrategy** to infer the messages schema.
             Use `io.confluent.kafka.serializers.subject.RecordNameStrategy` for your consumer.
         x-value.subject.name.strategy:
           type: string
           description: |
             We use the **RecordNameStrategy** to infer the messages schema.
             Use `io.confluent.kafka.serializers.subject.RecordNameStrategy` for your consumer.

So maybe it will be nice to allow bindings extension by x- prefixes

A Slack community discussion is started here : https://asyncapi.slack.com/archives/C34F2JV0U/p1618495734389300

derberg · 2021-04-20T06:34:09Z

@M3lkior where does it say extensions in bindings are not allowed? I mean that additional properties are not allowed?

when you do

    bindings:
      kafka:
        x-test: test

all is good, but for now of course because of the fact that bindings do not have schemas

M3lkior · 2021-04-20T07:33:18Z

@M3lkior where does it say extensions in bindings are not allowed? I mean that additional properties are not allowed?

when you do
    bindings:
      kafka:
        x-test: test
all is good, but for now of course because of the fact that bindings do not have schemas

i try to set bindings with x- prefixes, but they are not displayed or returned by the bindings() method

derberg · 2021-04-20T07:49:45Z

@M3lkior you sure? I recently updated playground with latest HTML where you contributed bindings info rendering and I can see the extension is rendered 👇🏼

If spec doesn't say extensions are not allowed, then we are good. Bindings object can be extended and protocol-specific binding objects too. There is some work on schemas started #60

Can I suggest you create a separate issue in the spec repo to clarify if we should not have explicitly described in the spec that specific bindings must always be extendable? Instead of having it as a decision per binding?

M3lkior · 2021-04-20T09:24:59Z

@M3lkior you sure? I recently updated playground with latest HTML where you contributed bindings info rendering and I can see the extension is rendered 👇🏼

If spec doesn't say extensions are not allowed, then we are good. Bindings object can be extended and protocol-specific binding objects too. There is some work on schemas started #60

Can I suggest you create a separate issue in the spec repo to clarify if we should not have explicitly described in the spec that specific bindings must always be extendable? Instead of having it as a decision per binding?

yes sorry :

(by the way, there is a rendering issue with too long text in the declaration)

derberg · 2021-04-20T10:23:04Z

@M3lkior no worries. Fix for the bug asyncapi/html-template#183

dalelane · 2021-04-20T19:59:37Z

where are we with this PR? looks good to me

@derberg Sorry, been super busy recently and this one got away from me.

I've changed the state to Draft to reflect that there are a few bits of feedback that I need to consider and reflect in the PR:

Apicurio pointed out that they have different approaches to encoding schema info in the body that I'm not adequately capturing
@M3lkior's pointed out that I haven't captured Confluent strategies

lbroudoux · 2021-04-29T08:57:13Z

Hi there!

Had a review this morning and I'm struggling with the useSchemaRegistry and schemaIdLocation properties when this later value is none.

From the binding README, definition of useSchemaRegistry is clear but the examples from @dalelane Gist confuse me as they seems to favor the schemaIdLocation=none approach. Especially the first one here: https://gist.github.com/dalelane/3931c17b14c51fa4a1cf25496237d188#binary-encoded-avro

Also the default value for schemaIdLocation is none which - as I understand from the Gist - means useSchemaRegistry is equivalent to false - but this one has no default value specified.

I think we should solve this ambiguity and I see 2 options at the moment:

Keep both properties and have useSchemaRegistry default to false and restrict schemaIdLocation to either payload - the default because the most common - or header,
Remove the useSchemaRegistry and just keep schemaIdLocation with none as default, + the payload and header values.

From my perspective, I prefer option 1. as I find it more easy to use for non-expert of SerDes library and more straightforward for most common cases. That way, you may have something like:

channels:
  user-signedup:
    publish:
      bindings:
        kafka:
          groupId:
            type: string
            enum: ['myGroupId']
          clientId:
            type: string
            enum: ['myClientId']
          bindingVersion: '0.2.0'

and this is telling that this operation/channel is not using any registry. Then we can enable registry usage explicitly:

channels:
  user-signedup:
    publish:
      bindings:
        kafka:
          groupId:
            type: string
            enum: ['myGroupId']
          clientId:
            type: string
            enum: ['myClientId']
          useSchemaRegistry: true
          bindingVersion: '0.2.0'

and this mean that a Schema registry is used and consumer will have to pick the SerDes implementation corresponding to registry vendor. The Schema ID Location will thus be the payload of the message. Then we can also refine things adding:

channels:
  user-signedup:
    publish:
      bindings:
        kafka:
          groupId:
            type: string
            enum: ['myGroupId']
          clientId:
            type: string
            enum: ['myClientId']
          useSchemaRegistry: true
          bindingVersion: '0.2.0'
      message:
        bindings:
          kafka:
            schemaIdLocation: 'header'
            bindingVersion: '0.2.0'

and this is necessary only is we're not sticking with most common usage.

What do you think?

nictownsend · 2021-04-30T12:13:14Z

@lbroudoux I think there are two use cases this is looking to handle:

Inform API developers to enable them to pick the right vendor serdes
Inform API developers how to bypass if schema registry is unavailable

For use case 1 - schemaRegistryURL + schemaRegistryVendor is sufficient to pick a SERDES library. useSchemaRegistry flags if the SERDES library should be used for the operation (if not, assumption is message has been produced without using a SERDES library (e.g basic producer using AVRO).

Missing:

Naming strategies in each operation allow additional SERDES config for Confluent + Apicurio. These maybe just need to be class names so that the key can be vendor agnostic (naming/lookupStrategy)?
Apicurio ID encoding types

For use case 2 - to deserialize from the payload without using a SERDES library you need to know:

schemaIdLocation - so that you know if you have to skip any bytes (Apicurio looks for a magic byte to automatically determine if in header or not - confluent only use payload, Event Streams uses header only). Maybe this just needs to be schemaIdInPayload = true/false
schemaRegistryVendor - so that for each wire format e.g Confluent, Apicurio (find in page 'magic byte') - you know how many bytes to skip (magic + schema id)
Apicurio ID encoding type to skip the right number of bytes for the schema ID

I think this means that there are two sets of properties - with overlap but no conflict. You could combine the two sets to describe both use cases on the same spec without any clashes (though for code generation purposes I guess you'd generate using SERDES if all required SERDES properties are described).

I think the only change to the constraint is that schemaRegistryVendor is required if either schemaIdInPayload or schemaRegistryURL are supplied.

Does that clear up the ambiguity by separating properties into the two different concerns?

lbroudoux · 2021-04-30T14:08:06Z

Thanks @nictownsend : this makes things clearer.

However, I'd thought that schemaIdLocation is still useful for use case 1 - enabling developers to correctly setup SERDES properties. I especially think to the Apicurio usage of headers for ID.

props.putIfAbsent(SerdeConfig.ENABLE_HEADERS, "true")

Is it correct?

+1 for the missing lookup strategies!

nictownsend · 2021-04-30T17:27:55Z

Thanks @nictownsend : this makes things clearer.

However, I'd thought that schemaIdLocation is still useful for use case 1 - enabling developers to correctly setup SERDES properties. I especially think to the Apicurio usage of headers for ID.

Good point, yes for Apicurio case - schemaIdLocation is needed for producer configuration (I had stupidly focused on consume only).

So do we then have the following properties:

schemaRegistryURL - url for the registry (use case 1 only)
schemaRegistryVendor - SERDES library to use (use case 1 only)
schemaIdLocation - where to find the ID (and thus using this property means useSchemaRegistry is no longer needed as use of this is stating "this operation uses SERDES") (both use cases)
schemaIdPayloadEncoding - length of bytes or vendor specific values (e.g confluent/apicurio-legacy/apicurio-new) (both use cases)
schemaLookupStrategy - freeform string for any naming strategy class to use (use case 1 only) and clients default to the vendor default if not supplied

Only query is whether it can be assumed that every vendor uses a magic byte when writing schema to payload. If not, then schemaRegistryVendor becomes use case 2 as well.

github-actions · 2021-06-30T00:45:07Z

This pull request has been automatically marked as stale because it has not had recent activity 😴
It will be closed in 60 days if no further activity occurs. To unstale this pull request, add a comment with detailed explanation.
Thank you for your contributions ❤️

github-actions bot reviewed Feb 23, 2021

View reviewed changes

dalelane changed the title ~~docs: Update Kafka bindings to describe schema usage~~ docs: update Kafka bindings to describe schema usage Feb 23, 2021

dalelane mentioned this pull request Feb 24, 2021

docs: add essential Kafka client security parameters to bindings #56

Closed

fmvilas reviewed Feb 24, 2021

View reviewed changes

kafka/traits/apicurio-v0.0.1.yaml Outdated Show resolved Hide resolved

lbroudoux mentioned this pull request Feb 25, 2021

Specify how Avro schema ID is serialized on Kafka #41

Closed

lbroudoux reviewed Feb 25, 2021

View reviewed changes

kafka/README.md Outdated Show resolved Hide resolved

dalelane added 2 commits February 25, 2021 13:53

chore: address review comments

8788ff5

Moving the message traits definitions into an examples folder to make it clear that they are not part of the spec or bindings. Contributes to: asyncapi#40 Signed-off-by: Dale Lane <[email protected]>

fix: address review comments

65ba2c1

Signed-off-by: Dale Lane <[email protected]>

dalelane mentioned this pull request Feb 26, 2021

AsyncAPI open meeting, Tuesday Mar 2 2021 4PM UTC asyncapi/spec#504

Closed

dalelane added 2 commits March 3, 2021 20:48

lornajane reviewed Mar 9, 2021

View reviewed changes

kafka/README.md Outdated Show resolved Hide resolved

fix: Add additional vendor example suggested in review

c051f7b

Also sorted the list alphabetically to avoid indicating any other significance to the ordering. Signed-off-by: Dale Lane <[email protected]>

dalelane marked this pull request as draft April 20, 2021 19:56

github-actions bot added the stale label Jun 30, 2021

github-actions bot closed this Aug 30, 2021

lbroudoux mentioned this pull request Feb 4, 2022

feat: add schema registry infos to Kafka binding #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update Kafka bindings to describe schema usage #55

docs: update Kafka bindings to describe schema usage #55

dalelane commented Feb 23, 2021

github-actions bot left a comment

dalelane commented Feb 23, 2021

lbroudoux left a comment

derberg commented Apr 16, 2021

M3lkior commented Apr 19, 2021

derberg commented Apr 20, 2021

M3lkior commented Apr 20, 2021

derberg commented Apr 20, 2021

M3lkior commented Apr 20, 2021

derberg commented Apr 20, 2021

dalelane commented Apr 20, 2021

lbroudoux commented Apr 29, 2021

nictownsend commented Apr 30, 2021

lbroudoux commented Apr 30, 2021

nictownsend commented Apr 30, 2021

github-actions bot commented Jun 30, 2021

docs: update Kafka bindings to describe schema usage #55

docs: update Kafka bindings to describe schema usage #55

Conversation

dalelane commented Feb 23, 2021

github-actions bot left a comment

Choose a reason for hiding this comment

dalelane commented Feb 23, 2021

lbroudoux left a comment

Choose a reason for hiding this comment

derberg commented Apr 16, 2021

M3lkior commented Apr 19, 2021

derberg commented Apr 20, 2021

M3lkior commented Apr 20, 2021

derberg commented Apr 20, 2021

M3lkior commented Apr 20, 2021

derberg commented Apr 20, 2021

dalelane commented Apr 20, 2021

lbroudoux commented Apr 29, 2021

nictownsend commented Apr 30, 2021

lbroudoux commented Apr 30, 2021

nictownsend commented Apr 30, 2021

github-actions bot commented Jun 30, 2021