Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

message: Handle message deletion event. #766

Merged
merged 2 commits into from
Jul 2, 2024
Merged

Conversation

PIG208
Copy link
Member

@PIG208 PIG208 commented Jun 26, 2024

When handling a message deletion event, we just simply remove the messages from the per account store and wipe them from the message list view if they are present.

Because message deletion can affect what items are to be seen (date separator, recipient header, etc.), if the view has any message deleted, a _reprocessAll call is triggered. It would be ideal to just removed the affected items for better performance.

@PIG208 PIG208 linked an issue Jun 26, 2024 that may be closed by this pull request
Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Small comments below; they're small enough that I'll go ahead and mark this for @gnprice's review too.

test/model/message_test.dart Outdated Show resolved Hide resolved
test/example_data.dart Outdated Show resolved Hide resolved
lib/model/message_list.dart Outdated Show resolved Hide resolved
test/model/message_test.dart Outdated Show resolved Hide resolved
Comment on lines 163 to 180
bool _removeMessagesById(Iterable<int> messageIds) {
bool hasRemovedMessage = false;
for (final messageId in messageIds) {
final messageIndex = _findMessageWithId(messageId);
if (messageIndex == -1) continue;

assert(contents.length == messages.length);
messages.removeAt(messageIndex);
contents.removeAt(messageIndex);
assert(contents.length == messages.length);
hasRemovedMessage = true;
}
if (hasRemovedMessage) _reprocessAll();
return hasRemovedMessage;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I notice in this implementation is that the List.removeAt calls can be expensive, if messageIndex is low and the list's length is large. From the doc; this part in particular:

[…] and moves all later objects down by one position.

There are ways we could be more efficient in that scenario (low messageIndex, large length), but I think they'd all come with tradeoffs in other scenarios. Also, bulk-delete events probably aren't very common anyway. But mentioning this in case you or Greg see something that's obviously an improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the cost of this loop is quadratic, which is enough to make me nervous — with $n$ messages in the list, if the event has $k$ messages to delete and $d$ of them are in this message list, it's $\Theta(k \log n + d n)$. (The first term is from the _findMessageWithId lookups.)

In particular if you're looking at a conversation of $n$ messages, and an admin decides to delete the whole conversation so $d = k = n$, that's $\Theta(n^2)$.

An algorithm that would fix that is:

  • First do the lookups, to determine if there are any affected messages in this list at all. That costs $\Theta(k \log n)$. (And saves us from spending $\Theta(n)$ on the next step if $d = 0$. If $d &gt; 1$, we're resigned to spending that much time anyway.)
  • Then, if there are any affected messages, use List.removeWhere to remove them from each of the lists all at once. (Build a Set out of messageIds first, to make that efficient.) This costs $\Theta(k + n)$. (The first term is for building the Set; the second for the removeWhere calls.)

That makes the total cost $O(k \log n + n)$ — no more quadratics. And it keeps it $O(k \log n)$ for the case where no messages were affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the time complexity analysis! So List.removeWhere performs in linear time? I couldn't find reference for how it is implemented. My guess is that they just make a copy of the list with the items removed.

Copy link
Member Author

@PIG208 PIG208 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like removing content with List.removeWhere wouldn't be quite straightforward because it is a list parallel to messages that don't have the message ID. A solution is to keep on using removeWhere but create a set of ZulipContent beforehand. This is based on the assumption that hashing ZulipContent should be cheap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So List.removeWhere performs in linear time? I couldn't find reference for how it is implemented. My guess is that they just make a copy of the list with the items removed.

My guess is it's a bit more efficient than that in that it reuses the existing array, and just moves items down to lower indexes to cover gaps. But yeah, should definitely be linear time.

The docs on List are a bit lacking in spelling out the performance characteristics, I think in part because there isn't a great place to put that information; List itself is actually an interface, and implementations can have different performance characteristics. But on List it does say:

The default growable list, as created by [], keeps an internal buffer, and grows that buffer when necessary. This guarantees that a sequence of add operations will each execute in amortized constant time. Setting the length directly may take time proportional to the new length, and may change the internal capacity so that a following add operation will need to immediately increase the buffer capacity. Other list implementations may have different performance behavior.

So basically for each method you should think of it as doing whatever is the natural most efficient thing it could be doing for that method, given that data structure.

That data structure is essentially the same as a Python list or a JS array, if that helps in thinking about it.

(btw that's an old Dart version in that docs link; if you delete the "2.9.3/" for the version so it just says "stable", it'll give you the current version.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like removing content with List.removeWhere wouldn't be quite straightforward because it is a list parallel to messages that don't have the message ID.

Hmm, I see.

In that case probably the best thing is to write the loop ourselves. Essentially do this:

reuses the existing array, and just moves items down to lower indexes to cover gaps.

but do the testing only on messages, and the moving on both messages and content.

@chrisbobbe chrisbobbe added the integration review Added by maintainers when PR may be ready for integration label Jun 26, 2024
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PIG208, and thanks @chrisbobbe for the initial review!

Comments below.

Because message deletion can affect what items are to be seen (date
separator, recipient header, etc.), if the view has any message deleted, a
`_reprocessAll` call is triggered. It would be ideal to just removed the
affected items for better performance.

Yeah, I'm content to have deletion take linear time in the length of the existing list, which is what _reprocessAll does. We're already doomed to linear time just for updating messages itself, so long as it's a plain list.

(And as Chris pointed to above and I go into more detail about below, this current revision can actually spend quadratic time updating messages — so _reprocessAll is cheap by comparison with that.)

lib/model/message.dart Outdated Show resolved Hide resolved
lib/model/message.dart Outdated Show resolved Hide resolved
Comment on lines 163 to 180
bool _removeMessagesById(Iterable<int> messageIds) {
bool hasRemovedMessage = false;
for (final messageId in messageIds) {
final messageIndex = _findMessageWithId(messageId);
if (messageIndex == -1) continue;

assert(contents.length == messages.length);
messages.removeAt(messageIndex);
contents.removeAt(messageIndex);
assert(contents.length == messages.length);
hasRemovedMessage = true;
}
if (hasRemovedMessage) _reprocessAll();
return hasRemovedMessage;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the cost of this loop is quadratic, which is enough to make me nervous — with $n$ messages in the list, if the event has $k$ messages to delete and $d$ of them are in this message list, it's $\Theta(k \log n + d n)$. (The first term is from the _findMessageWithId lookups.)

In particular if you're looking at a conversation of $n$ messages, and an admin decides to delete the whole conversation so $d = k = n$, that's $\Theta(n^2)$.

An algorithm that would fix that is:

  • First do the lookups, to determine if there are any affected messages in this list at all. That costs $\Theta(k \log n)$. (And saves us from spending $\Theta(n)$ on the next step if $d = 0$. If $d &gt; 1$, we're resigned to spending that much time anyway.)
  • Then, if there are any affected messages, use List.removeWhere to remove them from each of the lists all at once. (Build a Set out of messageIds first, to make that efficient.) This costs $\Theta(k + n)$. (The first term is for building the Set; the second for the removeWhere calls.)

That makes the total cost $O(k \log n + n)$ — no more quadratics. And it keeps it $O(k \log n)$ for the case where no messages were affected.

@@ -375,6 +375,40 @@ void main() {
});
});

group('handleDeleteMessageEvent', () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also need tests for the behavior of MessageListView, as well as that of MessageStoreImpl. After all, the former is where most of the code this change introduces is. 🙂

Those tests can go in model/message_list_test.dart; search for MessageEvent there for examples to follow. I think, given that we're using _reprocessAll rather than trying to do something more clever (as handleMessageEvent does), the only tests needed here are analogous to the tests with "MessageEvent" in their names.

Note one key thing those tests do is they call that checkInvariants function, in particular on every notifyListeners. It conducts a rather fierce set of consistency checks on the model, in particular recomputing everything that derives from messages and confirming it all has the right values.

Copy link
Member Author

@PIG208 PIG208 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note one key thing those tests do is they call that checkInvariants function, in particular on every notifyListeners. It conducts a rather fierce set of consistency checks on the model, in particular recomputing everything that derives from messages and confirming it all has the right values.

I see that in the 'MessageEvent, before fetch' test, but not the other 'MessageEvent' tests. Is that intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there 🙂 Via this part:

in particular on every notifyListeners.

See prepare for that call site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Quite possibly we should have checkNotNotified call checkInvariants too, and then the individual test cases will be more uniform in not needing to explicitly do so)

Copy link
Member Author

@PIG208 PIG208 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see! So the call is added in a listener. Then the post event invariant check of 'MessageEvent, not in narrow' seems to be missing because it isn't expecting a notification. Yeah it makes sense to add it to checkNotNotified too.

@PIG208 PIG208 force-pushed the deleted-msg branch 4 times, most recently from 6dc2eca to 90b2e6a Compare June 27, 2024 18:28
@PIG208
Copy link
Member Author

PIG208 commented Jun 27, 2024

The PR has been updated to address the reviews.

  • Used the proposed new algorithm to delete messages from the message list
  • Added tests to test/model/message_list_test.dart

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision! All looks good modulo a couple of small comments.

lib/model/message_list.dart Outdated Show resolved Hide resolved
/// If none of [messageIds] are found, this is a no-op.
bool _removeMessagesById(Iterable<int> messageIds) {
final messagesToRemoveById = <int>{};
final contentToRemove = <ZulipContent>{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final contentToRemove = <ZulipContent>{};
final contentToRemove = Set<ZulipContent>.identity();

That has no direct effect, because ZulipContent doesn't override == or hashCode. But it's not totally obvious that it wouldn't — indeed some of the other ContentNode subclasses do — so best to be explicit.

And comparing for identity, rather than general equality, is crucial to this algorithm's behavior both for performance and, in edge cases, for correctness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's interesting. So an identity set uses identical to distinguish elements, which compares the object references instead.

Comment on lines +174 to +175
messages.removeWhere((message) => messagesToRemoveById.contains(message.id));
contents.removeWhere((content) => contentToRemove.contains(content));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat solution. I think this algorithm works fine.

I might replace it with a direct loop if I find that convenient in the course of building the other features (like #421) that will reuse a version of this method, but I'll cross that bridge if I come to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Leaving it as-is for this revision.

test/model/message_list_test.dart Show resolved Hide resolved
@PIG208
Copy link
Member Author

PIG208 commented Jun 28, 2024

The PR has been updated. Thanks for the review @gnprice !

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision! Remaining comments are all in the tests; one flaky test (at least in principle), and a few nits.

test/model/message_list_test.dart Outdated Show resolved Hide resolved
test/model/message_list_test.dart Outdated Show resolved Hide resolved
test/model/message_list_test.dart Outdated Show resolved Hide resolved
test/model/message_list_test.dart Outdated Show resolved Hide resolved
When handling a message deletion event, we just simply remove the
messages from the per account store and wipe them from the message list
view if they are present.

Because message deletion can affect what items are to be seen (date
separator, recipient header, etc.), if the view has any message deleted, a
`_reprocessAll` call is triggered. It would be ideal to just removed the
affected items for better performance.

Fixes zulip#120.

Signed-off-by: Zixuan James Li <[email protected]>
@PIG208
Copy link
Member Author

PIG208 commented Jul 2, 2024

Thanks for the review! I have updated the PR and also removed an outdated checkInvariants call.

The `length` check isn't needed because it's covered by the
`deepEquals`.  (It was needed in a previous draft of this test).

Then we can also inline `expected`, and reformat slightly.
@gnprice
Copy link
Member

gnprice commented Jul 2, 2024

Thanks! All looks great; merging, with one added commit for a couple of nits:
1f97563 message test [nfc]: Tighten message-deletion test slightly

@gnprice gnprice merged commit 1f97563 into zulip:main Jul 2, 2024
@PIG208 PIG208 deleted the deleted-msg branch July 2, 2024 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration review Added by maintainers when PR may be ready for integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle deleted messages (delete_message events)
3 participants