-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs #18616
Comments
It should be And we also missed the topic name when serializing the |
Fixed.
No. We should not. The topic name should only be added in The |
Serializing the topic name could also make it harder to interact with clients of other languages because the topic name is not described in If user want to serialize the topic name, they should define their own protocol like: public static byte[] serialize(String topic, MessageId messageId) {
byte[] topicBytes = topic.getBytes(StandardCharsets.UTF_8);
byte[] messageIdBytes = messageId.toByteArray();
byte[] bytes = new byte[topicBytes.length + messageIdBytes.length];
System.arraycopy(topicBytes, 0, bytes, 0, topicBytes.length);
System.arraycopy(messageIdBytes, 0, bytes, topicBytes.length, messageIdBytes.length);
return bytes;
} |
I will use 3 PRs to complete this proposal.
|
Master Issue: apache#18616 ### Motivation Introduce `TopicMessageId` to support getting the owner topic of a `MessageId`. When a `MessageId` is retrieved from a received message, the owner topic will be correctly set by the client library. When it's returned by `Producer#send`, this PR provides a `TopicMessageId#create` method to configure the owner topic. `acknowledge` APIs are affected only for the error cases: when a `MessageId` other than a `TopicMessageId` is accepted on a multi-topics consumer, `PulsarClientException.NotAllowedException` will be thrown. The semantic of the `seek(MessageId)` API is changed. Now if a `TopicMessageId` is accepted on a multi-topics consumer, the seek behavior will happen on the internal consumer of the owner topic. ### Modifications - Add the `TopicMessageId` interface. - In `MultiTopicsConsumerImpl#doAcknowledge`, complete the future with `NotAllowedException` if the argument is not a `TopicMessageId`. - In `MultiTopicsConsumerImpl#seekAsync`, when the argument is a `TopicMessageId`, find the internal consumer according to the owner topic and pass the argument to it if it exists. - In `ConsumerImpl#seekAsync`, get the inner message ID of the `TopicMessageId` so that now a single-topic consumer can also accept a `TopicMessageId` to seek. Besides the main modifications above, this patch does some refactorings to avoid direct access to `TopicMessageIdImpl`: - Deprecated `getTopicName` method by trimming the partition suffix of the owner topic in `getOriginTopicNameStr`. - Deprecated `getTopicPartitionName` by `getOwnerTopic`. - `getInnerMessageId` cannot be deprecated because we still need to convert `TopicMessageId` to `MessageIdImpl` in many cases (because we cannot get the fields like ledger id). Instead of deprecating it, use `MessageIdImpl.convertToMessageIdImpl` to replace it. - In `convertToMessageIdImpl`, for a customized `TopicMessageId` implementation, use serialization and deserialization to get the `MessageIdImpl` object. ### Verifications Add the following tests to `MultiTopicsConsumerTest`: - `testAcknowledgeWrongMessageId`: verify the correct exceptions are thrown in `acknowledge` APIs - `testSeekCustomTopicMessageId`: verify the new seek semantics for a `TopicMessageId`, including the existing `TopicMessageIdImpl` and the customized implementation by `TopicMessageId#create` ### TODO - Add a standard SerDes class for `TopicMessageId` - Apply `TopicMessageId` into `getLastMessageId` related APIs. - Deprecate the `getInnerMessageId` after PIP-229 is approved.
Master Issue: apache#18616 ### Motivation Introduce `TopicMessageId` to support getting the owner topic of a `MessageId`. When a `MessageId` is retrieved from a received message, the owner topic will be correctly set by the client library. When it's returned by `Producer#send`, this PR provides a `TopicMessageId#create` method to configure the owner topic. `acknowledge` APIs are affected only for the error cases: when a `MessageId` other than a `TopicMessageId` is accepted on a multi-topics consumer, `PulsarClientException.NotAllowedException` will be thrown. The semantic of the `seek(MessageId)` API is changed. Now if a `TopicMessageId` is accepted on a multi-topics consumer, the seek behavior will happen on the internal consumer of the owner topic. ### Modifications - Add the `TopicMessageId` interface. - In `MultiTopicsConsumerImpl#doAcknowledge`, complete the future with `NotAllowedException` if the argument is not a `TopicMessageId`. - In `MultiTopicsConsumerImpl#seekAsync`, when the argument is a `TopicMessageId`, find the internal consumer according to the owner topic and pass the argument to it if it exists. - In `ConsumerImpl#seekAsync`, get the inner message ID of the `TopicMessageId` so that now a single-topic consumer can also accept a `TopicMessageId` to seek. Besides the main modifications above, this patch does some refactorings to avoid direct access to `TopicMessageIdImpl`: - Deprecated `getTopicName` method by trimming the partition suffix of the owner topic in `getOriginTopicNameStr`. - Deprecated `getTopicPartitionName` by `getOwnerTopic`. - `getInnerMessageId` cannot be deprecated because we still need to convert `TopicMessageId` to `MessageIdImpl` in many cases (because we cannot get the fields like ledger id). Instead of deprecating it, use `MessageIdImpl.convertToMessageIdImpl` to replace it. - In `convertToMessageIdImpl`, for a customized `TopicMessageId` implementation, use serialization and deserialization to get the `MessageIdImpl` object. ### Verifications Add the following tests to `MultiTopicsConsumerTest`: - `testAcknowledgeWrongMessageId`: verify the correct exceptions are thrown in `acknowledge` APIs - `testSeekCustomTopicMessageId`: verify the new seek semantics for a `TopicMessageId`, including the existing `TopicMessageIdImpl` and the customized implementation by `TopicMessageId#create` ### TODO - Add a standard SerDes class for `TopicMessageId` - Apply `TopicMessageId` into `getLastMessageId` related APIs. - Deprecate the `getInnerMessageId` after PIP-229 is approved.
Master Issue: apache#18616 ### Motivation Introduce `TopicMessageId` to support getting the owner topic of a `MessageId`. When a `MessageId` is retrieved from a received message, the owner topic will be correctly set by the client library. When it's returned by `Producer#send`, this PR provides a `TopicMessageId#create` method to configure the owner topic. `acknowledge` APIs are affected only for the error cases: when a `MessageId` other than a `TopicMessageId` is accepted on a multi-topics consumer, `PulsarClientException.NotAllowedException` will be thrown. The semantic of the `seek(MessageId)` API is changed. Now if a `TopicMessageId` is accepted on a multi-topics consumer, the seek behavior will happen on the internal consumer of the owner topic. ### Modifications - Add the `TopicMessageId` interface. - In `MultiTopicsConsumerImpl#doAcknowledge`, complete the future with `NotAllowedException` if the argument is not a `TopicMessageId`. - In `MultiTopicsConsumerImpl#seekAsync`, when the argument is a `TopicMessageId`, find the internal consumer according to the owner topic and pass the argument to it if it exists. - In `ConsumerImpl#seekAsync`, get the inner message ID of the `TopicMessageId` so that now a single-topic consumer can also accept a `TopicMessageId` to seek. Besides the main modifications above, this patch does some refactorings to avoid direct access to `TopicMessageIdImpl`: - Deprecated `getTopicName` method by trimming the partition suffix of the owner topic in `getOriginTopicNameStr`. - Deprecated `getTopicPartitionName` by `getOwnerTopic`. - `getInnerMessageId` cannot be deprecated because we still need to convert `TopicMessageId` to `MessageIdImpl` in many cases (because we cannot get the fields like ledger id). Instead of deprecating it, use `MessageIdImpl.convertToMessageIdImpl` to replace it. - In `convertToMessageIdImpl`, for a customized `TopicMessageId` implementation, use serialization and deserialization to get the `MessageIdImpl` object. ### Verifications Add the following tests to `MultiTopicsConsumerTest`: - `testAcknowledgeWrongMessageId`: verify the correct exceptions are thrown in `acknowledge` APIs - `testSeekCustomTopicMessageId`: verify the new seek semantics for a `TopicMessageId`, including the existing `TopicMessageIdImpl` and the customized implementation by `TopicMessageId#create` ### TODO - Add a standard SerDes class for `TopicMessageId` - Apply `TopicMessageId` into `getLastMessageId` related APIs. - Deprecate the `getInnerMessageId` after PIP-229 is approved.
The issue had no activity for 30 days, mark with Stale label. |
@BewareMyPower : serializing and deserializing is expensive and on top of that having different APIs for different use cases is creating a really bad experience for users, and I strongly feel we should avoid such APIs and complexity if things can be solved with a simple straight forward change with the same API and without creating the bad user experience. |
…LastMessageId Master Issue: apache#18616 Fixes apache#4940 NOTE: This implementation is different from the original design of PIP-224 that the method name is `getLastMessageIds` instead of `getLastTopicMessageId`. ### Motivation When a multi-topics consumer calls `getLastMessageId`, a `MultiMessageIdImpl` instance will be returned. It contains a map of the topic name and the latest message id of the topic. However, the `MultiMessageIdImpl` cannot be used in any place of the API that accepts a `MessageId` because all methods of the `MessageId` interface are not implemented, including `compareTo` and `toByteArray`. Therefore, users cannot do anything on such a `MessageId` implementation except casting `MessageId` to `MultiMessageIdImpl` and get the internal map. ### Modifications - Throw an exception when calling `getLastMessageId` on a multi-topics consumer instead of returning a `MultiMessageIdImpl`. - Remove the `MultiMessageIdImpl` implementation and its related tests. - Add the `getLastMessageIds` methods to `Consumer`. It returns a list of `TopicMessageId` instances, each of them represents the last message id of the owner topic. - Mark the `getLastMessageId` API as deprecated. ### Verifications - Modify the `TopicsConsumerImplTest#testGetLastMessageId` to test the `getLastMessageIds` for a multi-topics consumer. - Modify the `TopicReaderTest#testHasMessageAvailable` to test the `getLastMessageIds` for a single topic consumer.
I think it is necessary to introduce If users know MessageId topicMsgId = TopicMessageId.create(topicName,
DefaultImplementation.getDefaultImplementation().newMessageId(ledgerId, entryId, partitionIndex));
consumer.acknowledge(topicMsgId); Without |
The issue had no activity for 30 days, mark with Stale label. |
Motivation
There are two ways to get a
MessageId
from consumer:Message#getMessageId
from a received messageConsumer#getLastMessageId
The returned
MessageId
can be used inseek
andacknowledge
(includingacknowledgeCumulative
). AMessageId
represents the position of a message. For a consumer that subscribes a single topic, things are usually right because the MessageId from a received message or returned bygetLastMessageId
always belongs to the topic. However, it gets different for a consumer that subscribes multiple topics (let's say multi-topics consumer).For a multi-topics consumer:
getLastMessageId
returns aMultiMessageIdImpl
. However, it's a meaningless implementation. It maintains a map that maps topic name to aMessageId
. It's not comparable and serializable. And it's very weird to compare a MessageId and aMultiMessageIdImpl
that represents multiple messages' positions.seek
only accepts an earliest or latest position. Maybe it's because there is no way to know which topic the message belongs to.acknowledge
only accepts aTopicMessageIdImpl
implementation with agetTopicPartitionName()
method to know which topic the message belongs to. However, when the topic is not subscribed by the consumer, the behavior is undefined. The current behaviors are:MessageId
is not aTopicMessageIdImpl
,IllegalArgumentException
will be thrown.Goal
This proposal will introduce a
TopicMessageId
interface that exposes a method to get a message's owner topic. Therefore, we can make these behaviors more clear withTopicMessageId
.API Changes
The
create
method can be used to replace theTopicMessageIdImpl
implementation. Once we have aMessageId
returned byMessage#getMessageId()
, we can useTopicMessageId.create(topic, msgId)
to have a message id to seek or acknowledge.Add a
TopicMessageIdSerDes
class to serialize and deserialize aTopicMessageId
:Here we don't override the
toByteArray()
method for backward compatibility because there might be existing code like:This interface doesn't add any
acknowledge
overload because the overloads are already too many. But it will make the behavior clear.For
seek
operations, the semantics will be modified when the argument is aTopicMessageId
:i.e. for a multi-topics consumer, we can now seek the message id of a received message without any code change:
It's because the
MessageId
of a message received from a multi-topics consumer is aTopicMessageId
that already contains the correct topic name.For those want to convert a
MessageId
, which might not be aTopicMessageId
, to aTopicMessageId
if they know the topic name, they can perform the conversion explicitly like:It's used when users want to seek a
MessageId
returned byProducer#send
. In this case, theMessageId
is not aTopicMessageId
and we have to pass the topic name to tell the multi-topics consumer which topic it belongs to.For
getLastMessageAsync
method, change the semantics to disallow it on a multi-topics consumer. Instead, add these two APIs:Implementation
The main changes are removing the
TopicMessageIdImpl
andMultiTopicMessageIdImpl
. Then add the ability to seek or acknowledge a specific topic with aMessageId
on a multi-topics consumer. Much code and logic could be reused.Alternatives
https://lists.apache.org/thread/vydpjgyrfzkr45ho9r12sd2lw5c4mw6s tries to seek or acknowledge a topic and a
MessageId
. But it will increase many public APIs. With this proposal, new APIs are only added for the missed features (e.g. seek aMessageId
other thanearliest
orlatest
on a multi-topic consumer).Adding a
seek(String, MessageId)
overload is also denied for two reasons:MessageId
. Users have to get the correct topic name, e.g.my-topic-partition-2
. Otherwise, the seek command might be sent to a wrong topic and unexpected behavior might happen.seek(MessageId)
method. It's an inconsistency for user experience.Adding a
serialize
method and a staticdeserialize
method (liketoByteArray()
andfromByteArray
methods ofMessageId
) toTopicMessageId
is denied: the SerDes implementation ofTopericMessageId
should be unique. We should not allow other implementations.Inheriting a
Schema<TopicMessageId>
is also denied: there are some other methods inSchema
likegetSchemaInfo
that might make users confused. We only need a clear SerDes implementation ofTopicMessageId
.Anything else?
See previous discussions related to the
MessageId
:The text was updated successfully, but these errors were encountered: