You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.
There are two ways to get a MessageId from consumer:
Message#getMessageId from a received message
Consumer#getLastMessageId
The returned MessageId can be used in seek and acknowledge (including acknowledgeCumulative). A MessageId represents the position of a message. For a consumer that subscribes a single topic, things are usually right because the MessageId from a received message or returned by getLastMessageId always belongs to the topic. However, it gets different for a consumer that subscribes multiple topics (let's say multi-topics consumer).
For a multi-topics consumer:
getLastMessageId returns a MultiMessageIdImpl. However, it's a meaningless implementation. It maintains a map that maps topic name to a MessageId. It's not comparable and serializable. And it's very weird to compare a MessageId and a MultiMessageIdImpl that represents multiple messages' positions.
seek only accepts an earliest or latest position. Maybe it's because there is no way to know which topic the message belongs to.
acknowledge only accepts a TopicMessageIdImpl implementation with a getTopicPartitionName() method to know which topic the message belongs to. However, when the topic is not subscribed by the consumer, the behavior is undefined. The current behaviors are:
If the topic is not subscribed, no error is returned.
If the MessageId is not a TopicMessageIdImpl, IllegalArgumentException will be thrown.
Goal
This proposal will introduce a TopicMessageId interface that exposes a method to get a message's owner topic. Therefore, we can make these behaviors more clear with TopicMessageId.
API Changes
/** * The MessageId used for a consumer that subscribes multiple topics or partitioned topics. * * <p> * It's guaranteed that {@link Message#getMessageId()} must return a TopicMessageId instance if the Message is received * from a consumer that subscribes multiple topics or partitioned topics. * The topic name used in APIs related to this class like `getOwnerTopic` and `create` must be the full topic name. For * example, "my-topic" is invalid while "persistent://public/default/my-topic" is valid. * If the topic is a partitioned topic, the topic name should be the name of the specific partition, e.g. * "persistent://public/default/my-topic-partition-0". * </p> */publicinterfaceTopicMessageIdextendsMessageId {
/** * Return the owner topic name of a message. * * @return the owner topic */StringgetOwnerTopic();
staticTopicMessageIdcreate(Stringtopic, MessageIdmessageId) {
if (messageIdinstanceofTopicMessageId) {
return (TopicMessageId) messageId;
}
returnnewTopicMessageId() {
@OverridepublicStringgetOwnerTopic() {
returntopic;
}
@Overridepublicbyte[] toByteArray() {
returnmessageId.toByteArray();
}
@OverridepublicintcompareTo(MessageIdo) {
returnmessageId.compareTo(o);
}
};
}
}
The create method can be used to replace the TopicMessageIdImpl implementation. Once we have a MessageId returned by Message#getMessageId(), we can use TopicMessageId.create(topic, msgId) to have a message id to seek or acknowledge.
NOTE
This create method is actually not required for acknowledgment. Because the MessageId in the received message should be the correct implementation. So the previous code could work without explicitly using the new API.
It's only used for new seek APIs because they accept a TopicMessageId.
Add a TopicMessageIdSerDes class to serialize and deserialize a TopicMessageId:
/** * To keep the backward compatibility, {@link TopicMessageId#toByteArray()} should not serialize the owner topic. This * class provides a convenient way for users to serialize a TopicMessageId with its owner topic serialized. */classTopicMessageIdSerDes {
publicstaticbyte[] serialize(TopicMessageIdtopicMessageId) {/* ... */}
publicstaticTopicMessageIddeserialize(byte[] bytes) {/* ... */}
}
Here we don't override the toByteArray() method for backward compatibility because there might be existing code like:
varmsg = multiTopicsConsumer.receive();
// msg.getMessageId() is a TopicMessageIdserialize(multiTopicsConsumer.getTopic(), msg.getMessageId().toByteArray());
This interface doesn't add any acknowledge overload because the overloads are already too many. But it will make the behavior clear.
/** * ... * @throws PulsarClientException.NotAllowedException if `messageId` is not a {@link TopicMessageId} when multiple topics are subscribed. */voidacknowledge(MessageIdmessageId) throwsPulsarClientException;
// NOTE: the same goes for acknowledgeCumulative/** * ... * @throws PulsarClientException.NotAllowedException if any message id in the list is not a {@link TopicMessageId} when multiple topics are subscribed. */voidacknowledge(List<MessageId> messageIdList) throwsPulsarClientException;
For seek operations, the semantics will be modified when the argument is a TopicMessageId:
/** * ... * <p>Note: For multi-topics consumer, if `messageId` is a {@link TopicMessageId}, the seek operation will happen * on the owner topic of the message, which is returned by {@link TopicMessageId#getOwnerTopic()}. Otherwise, you * can only seek to the earliest or latest message. * ... */voidseek(MessageIdmessageId) throwsPulsarClientException;
i.e. for a multi-topics consumer, we can now seek the message id of a received message without any code change:
varmsgId = multiTopicsConsumer.receive().getMessageId(); // it's a TopicMessageIdconsumer.seek(msgId); // seek(TopicMessageId) will be called, before this PIP an exception would be thrown
It's because the MessageId of a message received from a multi-topics consumer is a TopicMessageId that already contains the correct topic name.
For those want to convert a MessageId, which might not be a TopicMessageId, to a TopicMessageId if they know the topic name, they can perform the conversion explicitly like:
It's used when users want to seek a MessageId returned by Producer#send. In this case, the MessageId is not a TopicMessageId and we have to pass the topic name to tell the multi-topics consumer which topic it belongs to.
For getLastMessageAsync method, change the semantics to disallow it on a multi-topics consumer. Instead, add these two APIs:
// NOTE: it's guaranteed that no duplicated owner topics should be returnedList<TopicMessageId> getLastTopicMessageId() throwsPulsarClientException;
CompletableFuture<List<TopicMessageId>> getLastTopicMessageIdAsync();
Implementation
The main changes are removing the TopicMessageIdImpl and MultiTopicMessageIdImpl. Then add the ability to seek or acknowledge a specific topic with a MessageId on a multi-topics consumer. Much code and logic could be reused.
Alternatives
https://lists.apache.org/thread/vydpjgyrfzkr45ho9r12sd2lw5c4mw6s tries to seek or acknowledge a topic and a MessageId. But it will increase many public APIs. With this proposal, new APIs are only added for the missed features (e.g. seek a MessageId other than earliest or latest on a multi-topic consumer).
Adding a seek(String, MessageId) overload is also denied for two reasons:
There is only one valid topic for a given MessageId. Users have to get the correct topic name, e.g. my-topic-partition-2. Otherwise, the seek command might be sent to a wrong topic and unexpected behavior might happen.
Users have to distinguish whether the consumer is a multi-topics consumer. If not, they need to use the original seek(MessageId) method. It's an inconsistency for user experience.
Adding a serialize method and a static deserialize method (like toByteArray() and fromByteArray methods of MessageId) to TopicMessageId is denied: the SerDes implementation of TopericMessageId should be unique. We should not allow other implementations.
Inheriting a Schema<TopicMessageId> is also denied: there are some other methods in Schema like getSchemaInfo that might make users confused. We only need a clear SerDes implementation of TopicMessageId.
Anything else?
See previous discussions related to the MessageId:
Original Issue: apache#18616
Motivation
There are two ways to get a
MessageId
from consumer:Message#getMessageId
from a received messageConsumer#getLastMessageId
The returned
MessageId
can be used inseek
andacknowledge
(includingacknowledgeCumulative
). AMessageId
represents the position of a message. For a consumer that subscribes a single topic, things are usually right because the MessageId from a received message or returned bygetLastMessageId
always belongs to the topic. However, it gets different for a consumer that subscribes multiple topics (let's say multi-topics consumer).For a multi-topics consumer:
getLastMessageId
returns aMultiMessageIdImpl
. However, it's a meaningless implementation. It maintains a map that maps topic name to aMessageId
. It's not comparable and serializable. And it's very weird to compare a MessageId and aMultiMessageIdImpl
that represents multiple messages' positions.seek
only accepts an earliest or latest position. Maybe it's because there is no way to know which topic the message belongs to.acknowledge
only accepts aTopicMessageIdImpl
implementation with agetTopicPartitionName()
method to know which topic the message belongs to. However, when the topic is not subscribed by the consumer, the behavior is undefined. The current behaviors are:MessageId
is not aTopicMessageIdImpl
,IllegalArgumentException
will be thrown.Goal
This proposal will introduce a
TopicMessageId
interface that exposes a method to get a message's owner topic. Therefore, we can make these behaviors more clear withTopicMessageId
.API Changes
The
create
method can be used to replace theTopicMessageIdImpl
implementation. Once we have aMessageId
returned byMessage#getMessageId()
, we can useTopicMessageId.create(topic, msgId)
to have a message id to seek or acknowledge.Add a
TopicMessageIdSerDes
class to serialize and deserialize aTopicMessageId
:Here we don't override the
toByteArray()
method for backward compatibility because there might be existing code like:This interface doesn't add any
acknowledge
overload because the overloads are already too many. But it will make the behavior clear.For
seek
operations, the semantics will be modified when the argument is aTopicMessageId
:i.e. for a multi-topics consumer, we can now seek the message id of a received message without any code change:
It's because the
MessageId
of a message received from a multi-topics consumer is aTopicMessageId
that already contains the correct topic name.For those want to convert a
MessageId
, which might not be aTopicMessageId
, to aTopicMessageId
if they know the topic name, they can perform the conversion explicitly like:It's used when users want to seek a
MessageId
returned byProducer#send
. In this case, theMessageId
is not aTopicMessageId
and we have to pass the topic name to tell the multi-topics consumer which topic it belongs to.For
getLastMessageAsync
method, change the semantics to disallow it on a multi-topics consumer. Instead, add these two APIs:Implementation
The main changes are removing the
TopicMessageIdImpl
andMultiTopicMessageIdImpl
. Then add the ability to seek or acknowledge a specific topic with aMessageId
on a multi-topics consumer. Much code and logic could be reused.Alternatives
https://lists.apache.org/thread/vydpjgyrfzkr45ho9r12sd2lw5c4mw6s tries to seek or acknowledge a topic and a
MessageId
. But it will increase many public APIs. With this proposal, new APIs are only added for the missed features (e.g. seek aMessageId
other thanearliest
orlatest
on a multi-topic consumer).Adding a
seek(String, MessageId)
overload is also denied for two reasons:MessageId
. Users have to get the correct topic name, e.g.my-topic-partition-2
. Otherwise, the seek command might be sent to a wrong topic and unexpected behavior might happen.seek(MessageId)
method. It's an inconsistency for user experience.Adding a
serialize
method and a staticdeserialize
method (liketoByteArray()
andfromByteArray
methods ofMessageId
) toTopicMessageId
is denied: the SerDes implementation ofTopericMessageId
should be unique. We should not allow other implementations.Inheriting a
Schema<TopicMessageId>
is also denied: there are some other methods inSchema
likegetSchemaInfo
that might make users confused. We only need a clear SerDes implementation ofTopicMessageId
.Anything else?
See previous discussions related to the
MessageId
:The text was updated successfully, but these errors were encountered: