-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent #19972
[fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent #19972
Conversation
@BewareMyPower Please add the following content to your PR description and select a checkbox:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShadowReplicator
hits this issue, too. Could you please also fix it?
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Outdated
Show resolved
Hide resolved
I think it's another issue. You can open another PR for it. |
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Outdated
Show resolved
Hide resolved
… cluster is absent ### Motivation Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail. Here is a log observed in a production environment: > WARN org.apache.pulsar.broker.service.BrokerService - Replication or > dedup check failed. Removing topic from topics list > persistent://public/__kafka/__consumer_offsets-partition-40, > java.util.concurrent.CompletionException: java.lang.RuntimeException: > org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException: > kop If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually. ### Modificatons In `addReplicationCluster`, before getting the replication client, check the namespace policy and topic policy first. If the remote cluster does not exist, skip adding the replication client and remove the cursor. ### Verifications `PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor` is added to verify `PersistentTopic#initialize` will succeed and the zombie replicator cursor will be removed.
1469b85
to
8aeb37d
Compare
Hi, @poorbarcode @315157973 Now I checked the namespace and topic policies to determine whether to delete the cursor. |
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #19972 +/- ##
============================================
- Coverage 72.89% 72.85% -0.05%
+ Complexity 31619 31613 -6
============================================
Files 1861 1861
Lines 137356 137375 +19
Branches 15117 15120 +3
============================================
- Hits 100131 100089 -42
- Misses 29271 29329 +58
- Partials 7954 7957 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
.thenCompose(__ -> checkReplicationCluster(remoteCluster)) | ||
.thenCompose(clusterExists -> { | ||
if (!clusterExists) { | ||
log.warn("Remove the replicator because the cluster '{}' does not exist", remoteCluster); | ||
return removeReplicator(remoteCluster).thenApply(__ -> null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we clean up when the Policy is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this comment when I merged. And yes, we should do that as well. WDYT? @poorbarcode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we clean up when the Policy is removed?
When the policy is removed, the cursor will not be clean If this topic is not online. see
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Line 1589 in d1fc732
futures.add(removeShadowReplicator(shadowTopic)); |
This PR introduced a new flaky test, reported as #20010 . @BewareMyPower do you have a chance to take a look? thanks |
Okay, I assigned that issue to me first. |
replicators.get(remoteCluster).disconnect().thenRun(() -> { | ||
|
||
Optional.ofNullable(replicators.get(remoteCluster)).map(Replicator::disconnect) | ||
.orElse(CompletableFuture.completedFuture(null)).thenRun(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orElseGet
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not. See
public static <U> CompletableFuture<U> completedFuture(U value) {
return new CompletableFuture<U>((value == null) ? NIL : value);
}
so completedFuture(null)
won't create a new instance.
… cluster is absent (#19972)
… cluster is absent (apache#19972) (cherry picked from commit d1fc732) (cherry picked from commit b93176c)
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
…ting logic for better handle the checkpoint. (apache#19972) * Change the initial start cursor and stop cursor to better handle the consuming behaviors. * Create the initial subscription instead seek every time. This should fix the wrong position setting. * Fix the wrong stop cursor, make sure it stops at the correct space * Drop Consumer.seek() for apache#16171
Motivation
Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail.
Here is a log observed in a production environment:
If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually.
Modifications
In
addReplicationCluster
, before getting the replication client, check the namespace policy and topic policy first. If the remote cluster does not exist, skip adding the replication client and remove the cursor.Verifications
PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor
is added to verifyPersistentTopic#initialize
will succeed and the zombie replicator cursor will be removed.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: BewareMyPower#23