Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor][client] Introduce PulsarApiMessageId to access fields of MessageIdData #18890

Conversation

BewareMyPower
Copy link
Contributor

Motivation

Currently the MessageId interface hiddens all fields of the MessageIdData struct defined in PulsarApi.proto. It's usually enough for application users because they don't need to access the fields. But for client developers and developers of other Pulsar ecosystems (e.g. the built-in Kafka connector and the Flink connector in another repo), the MessageId interface is too simple and there is no common used abstraction. We can see many code usages like:

if (msgId instanceof BatchMessageIdImpl) {
    // Do type cast and then access fields like ledger id...
} else if (msgId instanceof MessageIdImpl) {
    // Do type cast and then access fields like ledger id...
    // NOTE: don't put this else if before the previous one because
    // BatchMessageIdImpl is also a MessageIdImpl
} // ...

These MessageId implementations are used directly. It's a very bad design because any change to the public APIs of these implementations could bring breaking changes.

Also, there is a TopicMessageIdImpl that each time a getInnerMessageId() method must be used to get the underlying MessageId object, then do the type assertion and cast again. It makes code unnecessarily complicated.

Modifications

Introduce the PulsarApiMessageId interface into the pulsar-common module. All MessageId implementations so far (except MultiMessageId) should extend this interface so we can do the following conversion safely in client code or other modules:

long ledgerId = ((PulsarApiMessageId) msgId).getLedgerId();

Regarding the ack_set field, use a BitSet instead of the BatchMessageAcker to record if a message in the batch is acknowledged.

Since the TopicMessageId is just a proxy of other MessageId implementations, it's stored as key or value in the map directly because the compareTo/equal/hashCode methods have the same semantics with the underlying MessageId. There is no need to cast the type and call getInnerMessageId.

Remove all other usages and mark the public methods as deprecated to avoid breaking changes. They could be removed in the next major release.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: BewareMyPower#11

@BewareMyPower BewareMyPower added area/client type/refactor Code or documentation refactors. e.g. refactor code structure or methods to improve code readability labels Dec 12, 2022
@BewareMyPower BewareMyPower added this to the 2.12.0 milestone Dec 12, 2022
@BewareMyPower BewareMyPower self-assigned this Dec 12, 2022
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 12, 2022
@BewareMyPower BewareMyPower force-pushed the bewaremypower/introduce-msg-id-data-wrapper branch from ed8933d to 539b150 Compare December 12, 2022 16:46
import org.testng.annotations.Test;

@Test(groups = "broker-api")
public class CustomMessageIdTest extends ProducerConsumerBase {
Copy link
Contributor Author

@BewareMyPower BewareMyPower Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shows another benefit of this design. Now we can extend PulsarApiMessageId to create our own MessageId implementations for seek and acknowledge APIs. So the PulsarApiMessageId interface is very flexiable and scalable. Take this PR (apache/flink#21069) for example, we don't need to use the APIs from pulsar-client module to create a message id that represents a previous message id. /cc @tisonkun @syhily

Currently the `MessageId` interface hiddens all fields of the
`MessageIdData` struct defined in `PulsarApi.proto`. It's usually enough
for application users because they don't need to access the fields. But
for client developers and developers of other Pulsar ecosystems (e.g.
the built-in Kafka connector and the Flink connector in another repo),
the `MessageId` interface is too simple and there is no common used
abstraction. We can see many code usages like:

```java
if (msgId instanceof BatchMessageIdImpl) {
    // Do type cast and then access fields like ledger id...
} else if (msgId instanceof MessageIdImpl) {
    // Do type cast and then access fields like ledger id...
    // NOTE: don't put this else if before the previous one because
    // BatchMessageIdImpl is also a MessageIdImpl
} // ...
```

These `MessageId` implementations are used directly. It's a very bad
design because any change to the public APIs of these implementations
could bring breaking changes.

Also, there is a `TopicMessageIdImpl` that each time a
`getInnerMessageId()` method must be used to get the underlying
`MessageId` object, then do the type assertion and cast again. It makes
code unnecessarily complicated.

### Modifications

Introduce the `PulsarApiMessageId` interface into the `pulsar-common`
module. All `MessageId` implementations so far (except `MultiMessageId`)
should extend this interface so we can do the following conversion
safely in client code or other modules:

```java
long ledgerId = ((PulsarApiMessageId) msgId).getLedgerId();
```

Regarding the `ack_set` field, use a `BitSet` instead of the
`BatchMessageAcker` to record if a message in the batch is acknowledged.

Since the `TopicMessageId` is just a proxy of other `MessageId`
implementations, it's stored as key or value in the map directly because
the `compareTo`/`equal`/`hashCode` methods have the same semantics with
the underlying `MessageId`. There is no need to cast the type and call
`getInnerMessageId`.

Remove all other usages and mark the public methods as deprecated to
avoid breaking changes. They could be removed in the next major release.

Add a `CustomMessageIdTest` to verify any valid `MessageId` implementation
works for `seek` and `acknowledge` APIs.
@BewareMyPower BewareMyPower force-pushed the bewaremypower/introduce-msg-id-data-wrapper branch from 539b150 to 4029416 Compare December 13, 2022 16:23
@BewareMyPower
Copy link
Contributor Author

I will open a PIP first.

@BewareMyPower
Copy link
Contributor Author

Close this PR and continue the work after PIP-229 is approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client doc-not-needed Your PR changes do not impact docs type/refactor Code or documentation refactors. e.g. refactor code structure or methods to improve code readability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant