msglist: Throttle fetchOlder retries #1050

PIG208 · 2024-11-06T22:23:32Z

This approach is different from how a BackoffMachine is typically used,
because the message list doesn't send and retry requests in a loop; its
caller retries rapidly on scroll changes, and we want to ignore the
excessive requests.

The test drops irrelevant requests with connection.takeRequests
without checking, as we are only interested in verifying that no request
was sent.

Fixes: #945

chrisbobbe · 2024-11-07T02:52:59Z

(Rerunning CI.)

chrisbobbe

Thanks! Good to get this bug fixed. Comments below.

Also, this change breaks an invariant we had before: whenever the top of the message list is scrolled into view, we would show a "start marker", which was either a loading indicator or some text saying there aren't any older messages to load.

We currently only show the loading indicator when fetchingOlder is true, and that's false when we're in the new "cooldown" period after a failed fetch-older request. How about also showing the loading indicator during the cooldown period? When a request is failing over and over, the cooldown period will quickly become multiple seconds long, with fetchingOlder flickering to true in between for perhaps milliseconds at a time; possibly not long enough for a frame.

Then, either here or as a followup, we could give the UI an "error"/"problem" state that's distinct from the loading state. If the request has failed, and especially if it's failed several times, that's an important sign that maybe the user should stop expecting it to succeed. Maybe MessageListLoadingItem could get another param like bool problem, for which we pass true when the BackoffMachine.waitsCompleted exceeds a certain value, like 4. And in that case make the loading indicator look like

or something, instead of just

.

chrisbobbe · 2024-11-12T06:45:13Z

lib/model/message_list.dart

+  bool _waitBeforeRetry = false;
+  BackoffMachine? _backoffMachine;


I think we can find a more fitting name than _waitBeforeRetry. I think that name might make it harder to see that no code is actually scheduling a retry after waiting for anything. It's also not clear from the name that it's about fetching the next batch of older messages, i.e., fetchOlder.

What if we:

Move this up near fetchingOlder, which is related and has a similar role

Make this public, with a name like fetchOlderCoolingDown

Give this a dartdoc and expand on fetchingOlder's dartdoc to make it clearer that they have similar roles

So, for example:

--- lib/model/message_list.dart +++ lib/model/message_list.dart @@ -92,9 +92,30 @@ mixin _MessageSequence { bool _haveOldest = false; /// Whether we are currently fetching the next batch of older messages. + /// + /// When this is true, [fetchOlder] is a no-op. + /// That method is called frequently by Flutter's scrolling logic, + /// and this field helps us avoid spamming the same request just to get + /// the same response each time. + /// + /// See also [fetchOlderCoolingDown]. bool get fetchingOlder => _fetchingOlder; bool _fetchingOlder = false; + /// Whether [fetchOlder] had a request error recently. + /// + /// When this is true, [fetchOlder] is a no-op. + /// That method is called frequently by Flutter's scrolling logic, + /// and this field mitigates spamming the same request and getting + /// the same error each time. + /// + /// "Recently" is decided by a [BackoffMachine] that resets + /// when a [fetchOlder] request succeeds. + /// + /// See also [fetchingOlder]. + bool get fetchOlderCoolingDown => _fetchOlderCoolingDown; + bool _fetchOlderCoolingDown = false; + /// The parsed message contents, as a list parallel to [messages]. /// /// The i'th element is the result of parsing the i'th element of [messages].

chrisbobbe · 2024-11-12T06:51:16Z

lib/model/message_list.dart

@@ -528,6 +545,7 @@ class MessageListView with ChangeNotifier, _MessageSequence {

      _insertAllMessages(0, fetchedMessages);
      _haveOldest = result.foundOldest;
+      _backoffMachine = null;


Should we also do this in _reset?

PIG208 · 2024-11-12T23:05:09Z

The PR has been updated with the proposed changes, and additionally, a check for this.generation == generation to avoid potential races after adding _updateEndMarkers and notifyListeners to the backoff callback.

I have also moved the backoff machine variable, renamed to _fetchOlderCooldownBackoffMachine with Cooldown as a noun, to _MessageSequence right next to fetchOlderCoolingDown, because they are relevant to each other. The backoff machine does not have a public getter and we keep it private.

The loading indicator change might be out of scope for this PR, and we should work on that as a follow-up.

chrisbobbe · 2024-11-13T00:04:13Z

Looks like CI is failing, could you take a look?

PIG208 · 2024-11-13T00:29:55Z

Updated the PR. Thanks!

chrisbobbe

Thanks! Small comments below.

chrisbobbe · 2024-12-03T20:18:15Z

lib/model/message_list.dart

+        // ignore: control_flow_in_finally
+        return;


When disabling a lint rule (especially one that says it's meant to prevent bugs or unexpected behavior), please explain in a code comment why it's OK.

chrisbobbe · 2024-12-03T20:23:41Z

test/model/message_list_test.dart

+    checkNotified(count: 2);
+    check(model).fetchOlderCoolingDown.isTrue();
+
+    connection.takeRequests();


msglist: Throttle fetchOlder retries This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: https://github.com/zulip/zulip-flutter/issues/945 Signed-off-by: Zixuan James Li <[email protected]>

The paragraph about the tests is a pretty small implementation detail that I think doesn't need a special mention in the commit message. On the other hand, it seems maybe helpful as an implementation comment directly on the code it's about.

chrisbobbe · 2024-12-03T20:35:16Z

test/model/message_list_test.dart

+        await store.handleEvent(eg.updateMessageEventMoveTo(
+          origTopic: movedMessages[0].topic,
+          origStreamId: otherStream.streamId,
+          newMessages: movedMessages,
+        ));


We currently respond to this event by calling _reset, but there's a comment suggesting we might stop doing that in some cases:

void _messagesMovedIntoNarrow() { // If there are some messages we don't have in [MessageStore], and they // occur later than the messages we have here, then we just have to // re-fetch from scratch. That's always valid, so just do that always. // TODO in cases where we do have data to do better, do better. _reset(); notifyListeners(); fetchInitial(); }

The test is meant to exercise the _reset case (as it says in its description); can we make it fail if it doesn't actually exercise that? Maybe:

// check that _reset was called check(model).fetched.isFalse();

PIG208 · 2024-12-06T00:32:51Z

Thanks for the review! This has been updated.

chrisbobbe

Thanks! One comment below.

chrisbobbe · 2024-12-10T01:23:56Z

lib/model/message_list.dart

+        // We need the finally block always clean up regardless of errors
+        // occured in the try block, and returning early here is necessary
+        // if such cleanup must be skipped, as the fetch is considered stale.
+        // ignore: control_flow_in_finally
+        return;


if (this.generation != generation) { // We need the finally block always clean up regardless of errors // occured in the try block, and returning early here is necessary // if such cleanup must be skipped, as the fetch is considered stale. // ignore: control_flow_in_finally return;

This explanation doesn't make sense to me. The cleanup "always" needs to be done…but there are times when it "must be skipped"? Your reasoning might be correct but I don't understand it yet.

As a reminder, the task is to explain why the problems flagged by control_flow_in_finally either don't exist or are acceptable.

How about:

// This lint rule is more helpful for flagging confusing uses of // control flow keywords, such as `break` in a block nested in a loop, // or `return` a value in a finally block when a value would have been // returned in the `try` block. Returning early to skip the cleanup // is none of these. // ignore: control_flow_in_finally

I was also tempted to refactor the if block into if (this.generation == generation) instead of having an early return, but the code will become more nested.

Ooh, if (this.generation == generation) might actually be nicer. I guess it does basically the same thing, but without making the reader think through a lint rule and whether it's helpful here or not. I'd like to here @gnprice's thoughts; I'll mark for his review.

Yeah. The version in main uses if (this.generation == generation) here instead of our usual early return, and I think this lint rule was the main reason I went for that.

One extra layer of nesting is pretty tolerable, particularly as the indentation size is just 2 spaces.

chrisbobbe · 2024-12-10T03:09:51Z

Marking for Greg's review; Greg, please see in particular a question above: #1050 (comment)

gnprice · 2024-12-10T05:29:47Z

lib/model/message_list.dart

@@ -528,9 +567,21 @@ class MessageListView with ChangeNotifier, _MessageSequence {

      _insertAllMessages(0, fetchedMessages);
      _haveOldest = result.foundOldest;
+      _fetchOlderCooldownBackoffMachine = null;


nit: let's put this line that ends a backoff series right next to the logic that starts a backoff — as an else to the if (hasFetchError).

I believe that's equivalent to this way. And that way it's somewhat easier to see how this whole cooldown state machine operates.

gnprice · 2024-12-10T05:33:20Z

lib/model/message_list.dart

+    final startMarker = switch ((fetchingOlder, haveOldest, fetchOlderCoolingDown)) {
+      (true, _, _) => const MessageListLoadingItem(MessageListDirection.older),
+      (_, true, _) => const MessageListHistoryStartItem(),
+      (_, _, true) => const MessageListLoadingItem(MessageListDirection.older),


nit: the fact these two fetchOlder-related conditions produce the same result isn't a coincidence; so let's handle them together.

… Oh hmm but is the point that the order of these cases is significant? Is it possible to have haveOldest and fetchOlderCoolingDown both true?

If that is possible then I guess let's make that explicit. In the existing version in main, the assert(!(haveOldest && fetchingOlder)) is there precisely to reassure the reader that the order of the existing two cases doesn't matter.

I believe in fact it's the case in this revision that haveOldest and fetchOlderCoolingDown can't both be true. Which is good — it seems like that'd be a confused state to be in.

So let's (a) assert that, and (b) combine the two fetchOlder-related cases here. Could have a line like

final effectiveFetchingOlder = fetchingOlder || fetchOlderCoolingDown;

and then the same switch cases as in main.

gnprice · 2024-12-10T05:54:25Z

test/model/message_list_test.dart

@@ -1793,7 +1869,7 @@ void checkInvariants(MessageListView model) {
  if (model.haveOldest) {


Apropos of my previous comment #1050 (comment) : there's a new invariant, so let's have this checkInvariants function check that invariant.

gnprice · 2024-12-10T05:58:39Z

lib/model/message_list.dart

        _fetchingOlder = false;
+        if (hasFetchError) {
+          assert(!fetchOlderCoolingDown);
+          _fetchOlderCoolingDown = true;
+          unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine())
+            .wait().then((_) {
+              if (this.generation != generation) return;
+              _fetchOlderCoolingDown = false;


Or perhaps in fact we can avoid introducing more flags in the first place. How about we fuse the new flag with the old one — or put another way, instead of adding a new flag, we adjust the semantics of the existing fetchingOlder? Like this:

Suggested change

_fetchingOlder = false;

if (hasFetchError) {

assert(!fetchOlderCoolingDown);

_fetchOlderCoolingDown = true;

unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine())

.wait().then((_) {

if (this.generation != generation) return;

_fetchOlderCoolingDown = false;

if (!hasFetchError) {

_fetchingOlder = false;

} else {

unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine())

.wait().then((_) {

if (this.generation != generation) return;

_fetchingOlder = false;

gnprice

Thanks to you both! Here's the rest of a full review.

gnprice · 2024-12-10T06:03:11Z

test/model/message_list_test.dart

+    checkNotified(count: 2);
+    check(model).fetchOlderCoolingDown.isTrue();
+
+    // Drop irrelevant requests with without checking,


nit:

Suggested change

// Drop irrelevant requests with without checking,

// Drop irrelevant requests without checking,

gnprice · 2024-12-10T06:04:04Z

test/model/message_list_test.dart

+    // Drop irrelevant requests with without checking,
+    // as we are only interested in verifying if any request was sent.
+    connection.takeRequests();


This line and its comment are a bit puzzling. We're "interested in verifying if any request was sent" — but does this verify that? It looks like it just discards that information.

gnprice · 2024-12-10T06:06:52Z

test/model/message_list_test.dart

+    // The first backoff is expected to be short enough to complete.
+    async.elapse(const Duration(seconds: 1));


Suggested change

// The first backoff is expected to be short enough to complete.

async.elapse(const Duration(seconds: 1));

// Wait long enough that a first backoff is sure to finish.

async.elapse(const Duration(seconds: 1));

"short enough to complete" sounded to me like saying it's possible to complete it. But the point here is that this specific line definitely does complete it.

gnprice · 2024-12-10T06:17:01Z

test/model/message_list_test.dart

+    check(connection.lastRequest).isNotNull();
+  }));


I see, this is really the line that comment is most about.

It's fine to not inspect the details of the request. Let's include a check on model.messages.length, though — that gives the reader of the test a nice quick legible confirmation that we ended up getting all the messages, and therefore must have made the usual request and handled the response in the usual way.

gnprice · 2024-12-10T06:21:36Z

test/model/message_list_test.dart

+        connection.prepare(httpStatus: 400, json: {
+          'result': 'error', 'code': 'BAD_REQUEST', 'msg': 'Bad request'});
+        await check(model.fetchOlder()).throws<ZulipApiException>();
+        final backoffTimer = async.pendingTimers.single;


Hmm, clever!

gnprice · 2024-12-10T06:24:53Z

test/model/message_list_test.dart

+        // check that _reset was caleed
+        check(model).fetched.isFalse();
+        check(model).fetchOlderCoolingDown.isFalse();
+        check(async.pendingTimers).contains(backoffTimer);


Suggested change

check(async.pendingTimers).contains(backoffTimer);

check(backoffTimer.isActive()).isTrue();

gnprice · 2024-12-10T06:38:12Z

test/model/message_list_test.dart

+        async.elapse(const Duration(seconds: 1));
+        check(model).fetchOlderCoolingDown.isFalse();
+        check(async.pendingTimers).not((x) => x.contains(backoffTimer));
+        checkNotNotified();


When writing a test case, ideally we think through what scenarios would really stress the logic, and in particular would show clearly buggy symptoms if we had various plausible bugs.

So for example in the existing test just above this one, we let the fetch complete, and then check the model has no messages in its list:

await fetchFuture; checkHasMessages([]); checkNotNotified();

(and then call checkNotNotified, mainly to ensure that checkInvariants gets called). For that test case that's good because if the fetchOlder call finished and then carried on, forgetting to check generation, then the symptom would be that it puts the fetched messages into the list.

Here, the current revision of this test checks that fetchOlderCoolingDown is still false. But that's not very informative, because although false is the value that flag had before this step, it's also exactly what the code under test would set the flag to if it did suffer from the race.

So to make a version of this test that really stresses the logic as it should, we should arrange a scenario where by the time the backoff completes, the flag has been set to true. Then we get to check here that it's still true — which confirms we don't have the bug where the code cheerfully sets it to false heedless of the new generation.

Made a new test, "fetchOlder backoff A starts, _reset, move fetch finishes, fetchOlder backoff B starts, fetchOlder backoff A ends", for this. This also needed a refactor to make it easier to test with BackoffMachine, by offering a way to override the wait duration.

gnprice · 2024-12-10T06:39:49Z

test/model/message_list_test.dart

@@ -985,6 +1022,45 @@ void main() {
        checkNotifiedOnce();
      }));

+      test('fetchOlder backoff start, _reset, fetchOlder backoff ends, move fetch finishes', () => awaitFakeAsync((async) async {


nit: let's put this after both the existing "fetchOlder, _reset" test cases, instead of between the two of them. Those two are very similar and I feel like I want to read them together, comparing them.

chrisbobbe · 2024-12-10T19:49:04Z

Then, either here or as a followup, we could give the UI an "error"/"problem" state that's distinct from the loading state. If the request has failed, and especially if it's failed several times, that's an important sign that maybe the user should stop expecting it to succeed.

(Filed #1126 for this.)

gnprice

Thanks for the revision! All looks good now except three small comments.

gnprice · 2024-12-11T22:26:57Z

lib/model/message_list.dart

+          _fetchOlderCoolingDown = true;
+          unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine())
+            .wait().then((_) {
+              if (this.generation != generation) return;


nit:

Suggested change

if (this.generation != generation) return;

if (this.generation > generation) return;

(matching the similar checks elsewhere)

gnprice · 2024-12-11T22:29:01Z

test/model/message_list_test.dart

+    await model.fetchOlder();
+    checkNotified(count: 2);
+    check(connection.takeRequests()).single;
+  }));


bump #1050 (comment)

gnprice · 2024-12-11T22:32:38Z

test/model/message_list_test.dart

+        // When `backoffTimerA` ends, `fetchOlderCoolingDown` remains `true`
+        // because the backoff was from a previous generation.
+        async.elapse(const Duration(seconds: 1));
+        check(model).fetchOlderCoolingDown.isTrue();
+        check(backoffTimerA.isActive).isFalse();
+        check(backoffTimerB.isActive).isTrue();
+        checkNotNotified();
+
+        // When `backoffTimerB` ends, `fetchOlderCoolingDown` gets reset.
+        async.elapse(const Duration(seconds: 1));
+        check(model).fetchOlderCoolingDown.isFalse();
+        check(backoffTimerA.isActive).isFalse();
+        check(backoffTimerB.isActive).isFalse();
+        checkNotifiedOnce();
+      }));


This new test looks good!

gnprice · 2024-12-11T22:33:35Z

lib/api/backoff.dart

+    final duration = debugDuration ?? _maxDuration(const Duration(microseconds: 1),
+                                                   bound * Random().nextDouble());


nit: line too long (and in particular the value of microseconds is past 80 columns, just)

PIG208 · 2024-12-11T22:44:31Z

Updated the PR. Thanks for the review!

gnprice · 2024-12-11T23:07:05Z

Thanks! Looks good — rebased, and will watch CI.

This already holds for the existing callers. Updating end markers should only happen after the initial fetch. During the initial fetch, we have a separate loading indicator and no end markers. Signed-off-by: Zixuan James Li <[email protected]>

This will be used for testing. Signed-off-by: Zixuan James Li <[email protected]>

This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>

PIG208 force-pushed the pr-storm branch 4 times, most recently from 567be0d to 4bf2482 Compare November 6, 2024 23:38

PIG208 added the maintainer review PR ready for review by Zulip maintainers label Nov 6, 2024

PIG208 force-pushed the pr-storm branch 2 times, most recently from 8b14db4 to 62e334c Compare November 7, 2024 22:41

PIG208 assigned chrisbobbe Nov 7, 2024

PIG208 requested a review from chrisbobbe November 7, 2024 22:55

chrisbobbe reviewed Nov 12, 2024

View reviewed changes

PIG208 force-pushed the pr-storm branch from 62e334c to 5068b5e Compare November 12, 2024 19:56

This comment was marked as outdated.

Sign in to view

PIG208 force-pushed the pr-storm branch from 5068b5e to 71e0376 Compare November 12, 2024 22:55

PIG208 force-pushed the pr-storm branch from 71e0376 to 22a2c8d Compare November 13, 2024 00:29

chrisbobbe reviewed Dec 3, 2024

View reviewed changes

PIG208 force-pushed the pr-storm branch from 22a2c8d to 2bc878d Compare December 6, 2024 00:32

PIG208 requested a review from chrisbobbe December 6, 2024 00:32

chrisbobbe reviewed Dec 10, 2024

View reviewed changes

PIG208 force-pushed the pr-storm branch from 2bc878d to 81b4667 Compare December 10, 2024 01:46

chrisbobbe added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Dec 10, 2024

chrisbobbe assigned gnprice and unassigned chrisbobbe Dec 10, 2024

chrisbobbe requested a review from gnprice December 10, 2024 03:09

PIG208 force-pushed the pr-storm branch from 81b4667 to b903cef Compare December 10, 2024 05:28

gnprice reviewed Dec 10, 2024

View reviewed changes

chrisbobbe mentioned this pull request Dec 10, 2024

msglist: Show when fetch-older isn't succeeding #1126

Open

PIG208 force-pushed the pr-storm branch from b903cef to 09e5490 Compare December 10, 2024 21:47

PIG208 requested a review from gnprice December 10, 2024 21:49

gnprice reviewed Dec 11, 2024

View reviewed changes

PIG208 force-pushed the pr-storm branch from 09e5490 to eff21e5 Compare December 11, 2024 22:43

gnprice force-pushed the pr-storm branch from eff21e5 to 381370e Compare December 11, 2024 23:06

PIG208 added 3 commits December 11, 2024 15:19

backoff: Support overriding backoff duration

fd0aa31

This will be used for testing. Signed-off-by: Zixuan James Li <[email protected]>

gnprice force-pushed the pr-storm branch from 381370e to a7689da Compare December 11, 2024 23:20

gnprice merged commit a7689da into zulip:main Dec 11, 2024

PIG208 deleted the pr-storm branch December 11, 2024 23:30

		bool _waitBeforeRetry = false;
		BackoffMachine? _backoffMachine;

		@@ -1793,7 +1869,7 @@ void checkInvariants(MessageListView model) {
		if (model.haveOldest) {

	// Drop irrelevant requests with without checking,
	// Drop irrelevant requests without checking,

		// The first backoff is expected to be short enough to complete.
		async.elapse(const Duration(seconds: 1));

	check(async.pendingTimers).contains(backoffTimer);
	check(backoffTimer.isActive()).isTrue();

	if (this.generation != generation) return;
	if (this.generation > generation) return;

		final duration = debugDuration ?? _maxDuration(const Duration(microseconds: 1),
		bound * Random().nextDouble());

msglist: Throttle fetchOlder retries #1050

msglist: Throttle fetchOlder retries #1050

Conversation

PIG208 commented Nov 6, 2024 • edited Loading

chrisbobbe commented Nov 7, 2024

chrisbobbe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

PIG208 commented Nov 12, 2024

chrisbobbe commented Nov 13, 2024

PIG208 commented Nov 13, 2024

chrisbobbe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PIG208 commented Dec 6, 2024

chrisbobbe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisbobbe commented Dec 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gnprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisbobbe commented Dec 10, 2024

gnprice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PIG208 commented Dec 11, 2024

gnprice commented Dec 11, 2024

PIG208 commented Nov 6, 2024 •

edited

Loading