-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor][client] Introduce PulsarApiMessageId to access fields of MessageIdData #18890
[refactor][client] Introduce PulsarApiMessageId to access fields of MessageIdData #18890
Conversation
ed8933d
to
539b150
Compare
import org.testng.annotations.Test; | ||
|
||
@Test(groups = "broker-api") | ||
public class CustomMessageIdTest extends ProducerConsumerBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test shows another benefit of this design. Now we can extend PulsarApiMessageId
to create our own MessageId
implementations for seek
and acknowledge
APIs. So the PulsarApiMessageId
interface is very flexiable and scalable. Take this PR (apache/flink#21069) for example, we don't need to use the APIs from pulsar-client
module to create a message id that represents a previous message id. /cc @tisonkun @syhily
Currently the `MessageId` interface hiddens all fields of the `MessageIdData` struct defined in `PulsarApi.proto`. It's usually enough for application users because they don't need to access the fields. But for client developers and developers of other Pulsar ecosystems (e.g. the built-in Kafka connector and the Flink connector in another repo), the `MessageId` interface is too simple and there is no common used abstraction. We can see many code usages like: ```java if (msgId instanceof BatchMessageIdImpl) { // Do type cast and then access fields like ledger id... } else if (msgId instanceof MessageIdImpl) { // Do type cast and then access fields like ledger id... // NOTE: don't put this else if before the previous one because // BatchMessageIdImpl is also a MessageIdImpl } // ... ``` These `MessageId` implementations are used directly. It's a very bad design because any change to the public APIs of these implementations could bring breaking changes. Also, there is a `TopicMessageIdImpl` that each time a `getInnerMessageId()` method must be used to get the underlying `MessageId` object, then do the type assertion and cast again. It makes code unnecessarily complicated. ### Modifications Introduce the `PulsarApiMessageId` interface into the `pulsar-common` module. All `MessageId` implementations so far (except `MultiMessageId`) should extend this interface so we can do the following conversion safely in client code or other modules: ```java long ledgerId = ((PulsarApiMessageId) msgId).getLedgerId(); ``` Regarding the `ack_set` field, use a `BitSet` instead of the `BatchMessageAcker` to record if a message in the batch is acknowledged. Since the `TopicMessageId` is just a proxy of other `MessageId` implementations, it's stored as key or value in the map directly because the `compareTo`/`equal`/`hashCode` methods have the same semantics with the underlying `MessageId`. There is no need to cast the type and call `getInnerMessageId`. Remove all other usages and mark the public methods as deprecated to avoid breaking changes. They could be removed in the next major release. Add a `CustomMessageIdTest` to verify any valid `MessageId` implementation works for `seek` and `acknowledge` APIs.
539b150
to
4029416
Compare
I will open a PIP first. |
Close this PR and continue the work after PIP-229 is approved. |
Motivation
Currently the
MessageId
interface hiddens all fields of theMessageIdData
struct defined inPulsarApi.proto
. It's usually enough for application users because they don't need to access the fields. But for client developers and developers of other Pulsar ecosystems (e.g. the built-in Kafka connector and the Flink connector in another repo), theMessageId
interface is too simple and there is no common used abstraction. We can see many code usages like:These
MessageId
implementations are used directly. It's a very bad design because any change to the public APIs of these implementations could bring breaking changes.Also, there is a
TopicMessageIdImpl
that each time agetInnerMessageId()
method must be used to get the underlyingMessageId
object, then do the type assertion and cast again. It makes code unnecessarily complicated.Modifications
Introduce the
PulsarApiMessageId
interface into thepulsar-common
module. AllMessageId
implementations so far (exceptMultiMessageId
) should extend this interface so we can do the following conversion safely in client code or other modules:Regarding the
ack_set
field, use aBitSet
instead of theBatchMessageAcker
to record if a message in the batch is acknowledged.Since the
TopicMessageId
is just a proxy of otherMessageId
implementations, it's stored as key or value in the map directly because thecompareTo
/equal
/hashCode
methods have the same semantics with the underlyingMessageId
. There is no need to cast the type and callgetInnerMessageId
.Remove all other usages and mark the public methods as deprecated to avoid breaking changes. They could be removed in the next major release.
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: BewareMyPower#11