-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msglist: Throttle fetchOlder retries #1050
Conversation
567be0d
to
4bf2482
Compare
(Rerunning CI.) |
8b14db4
to
62e334c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good to get this bug fixed. Comments below.
Also, this change breaks an invariant we had before: whenever the top of the message list is scrolled into view, we would show a "start marker", which was either a loading indicator or some text saying there aren't any older messages to load.
We currently only show the loading indicator when fetchingOlder
is true, and that's false when we're in the new "cooldown" period after a failed fetch-older request. How about also showing the loading indicator during the cooldown period? When a request is failing over and over, the cooldown period will quickly become multiple seconds long, with fetchingOlder
flickering to true
in between for perhaps milliseconds at a time; possibly not long enough for a frame.
Then, either here or as a followup, we could give the UI an "error"/"problem" state that's distinct from the loading state. If the request has failed, and especially if it's failed several times, that's an important sign that maybe the user should stop expecting it to succeed. Maybe MessageListLoadingItem
could get another param like bool problem
, for which we pass true
when the BackoffMachine.waitsCompleted
exceeds a certain value, like 4. And in that case make the loading indicator look like
or something, instead of just
.
lib/model/message_list.dart
Outdated
bool _waitBeforeRetry = false; | ||
BackoffMachine? _backoffMachine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can find a more fitting name than _waitBeforeRetry
. I think that name might make it harder to see that no code is actually scheduling a retry after waiting for anything. It's also not clear from the name that it's about fetching the next batch of older messages, i.e., fetchOlder
.
What if we:
- Move this up near
fetchingOlder
, which is related and has a similar role - Make this public, with a name like
fetchOlderCoolingDown
- Give this a dartdoc and expand on
fetchingOlder
's dartdoc to make it clearer that they have similar roles
So, for example:
--- lib/model/message_list.dart
+++ lib/model/message_list.dart
@@ -92,9 +92,30 @@ mixin _MessageSequence {
bool _haveOldest = false;
/// Whether we are currently fetching the next batch of older messages.
+ ///
+ /// When this is true, [fetchOlder] is a no-op.
+ /// That method is called frequently by Flutter's scrolling logic,
+ /// and this field helps us avoid spamming the same request just to get
+ /// the same response each time.
+ ///
+ /// See also [fetchOlderCoolingDown].
bool get fetchingOlder => _fetchingOlder;
bool _fetchingOlder = false;
+ /// Whether [fetchOlder] had a request error recently.
+ ///
+ /// When this is true, [fetchOlder] is a no-op.
+ /// That method is called frequently by Flutter's scrolling logic,
+ /// and this field mitigates spamming the same request and getting
+ /// the same error each time.
+ ///
+ /// "Recently" is decided by a [BackoffMachine] that resets
+ /// when a [fetchOlder] request succeeds.
+ ///
+ /// See also [fetchingOlder].
+ bool get fetchOlderCoolingDown => _fetchOlderCoolingDown;
+ bool _fetchOlderCoolingDown = false;
+
/// The parsed message contents, as a list parallel to [messages].
///
/// The i'th element is the result of parsing the i'th element of [messages].
lib/model/message_list.dart
Outdated
@@ -528,6 +545,7 @@ class MessageListView with ChangeNotifier, _MessageSequence { | |||
|
|||
_insertAllMessages(0, fetchedMessages); | |||
_haveOldest = result.foundOldest; | |||
_backoffMachine = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do this in _reset
?
This comment was marked as outdated.
This comment was marked as outdated.
The PR has been updated with the proposed changes, and additionally, a check for I have also moved the backoff machine variable, renamed to The loading indicator change might be out of scope for this PR, and we should work on that as a follow-up. |
Looks like CI is failing, could you take a look? |
Updated the PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Small comments below.
lib/model/message_list.dart
Outdated
// ignore: control_flow_in_finally | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When disabling a lint rule (especially one that says it's meant to prevent bugs or unexpected behavior), please explain in a code comment why it's OK.
test/model/message_list_test.dart
Outdated
checkNotified(count: 2); | ||
check(model).fetchOlderCoolingDown.isTrue(); | ||
|
||
connection.takeRequests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msglist: Throttle fetchOlder retries
This approach is different from how a BackoffMachine is typically used,
because the message list doesn't send and retry requests in a loop; its
caller retries rapidly on scroll changes, and we want to ignore the
excessive requests.
The test drops irrelevant requests with `connection.takeRequests`
without checking, as we are only interested in verifying that no request
was sent.
Fixes: https://github.com/zulip/zulip-flutter/issues/945
Signed-off-by: Zixuan James Li <[email protected]>
The paragraph about the tests is a pretty small implementation detail that I think doesn't need a special mention in the commit message. On the other hand, it seems maybe helpful as an implementation comment directly on the code it's about.
test/model/message_list_test.dart
Outdated
await store.handleEvent(eg.updateMessageEventMoveTo( | ||
origTopic: movedMessages[0].topic, | ||
origStreamId: otherStream.streamId, | ||
newMessages: movedMessages, | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently respond to this event by calling _reset
, but there's a comment suggesting we might stop doing that in some cases:
void _messagesMovedIntoNarrow() {
// If there are some messages we don't have in [MessageStore], and they
// occur later than the messages we have here, then we just have to
// re-fetch from scratch. That's always valid, so just do that always.
// TODO in cases where we do have data to do better, do better.
_reset();
notifyListeners();
fetchInitial();
}
The test is meant to exercise the _reset
case (as it says in its description); can we make it fail if it doesn't actually exercise that? Maybe:
// check that _reset was called
check(model).fetched.isFalse();
Thanks for the review! This has been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One comment below.
lib/model/message_list.dart
Outdated
// We need the finally block always clean up regardless of errors | ||
// occured in the try block, and returning early here is necessary | ||
// if such cleanup must be skipped, as the fetch is considered stale. | ||
// ignore: control_flow_in_finally | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (this.generation != generation) {
// We need the finally block always clean up regardless of errors
// occured in the try block, and returning early here is necessary
// if such cleanup must be skipped, as the fetch is considered stale.
// ignore: control_flow_in_finally
return;
This explanation doesn't make sense to me. The cleanup "always" needs to be done…but there are times when it "must be skipped"? Your reasoning might be correct but I don't understand it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reminder, the task is to explain why the problems flagged by control_flow_in_finally
either don't exist or are acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
// This lint rule is more helpful for flagging confusing uses of
// control flow keywords, such as `break` in a block nested in a loop,
// or `return` a value in a finally block when a value would have been
// returned in the `try` block. Returning early to skip the cleanup
// is none of these.
// ignore: control_flow_in_finally
I was also tempted to refactor the if block into if (this.generation == generation)
instead of having an early return, but the code will become more nested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, if (this.generation == generation)
might actually be nicer. I guess it does basically the same thing, but without making the reader think through a lint rule and whether it's helpful here or not. I'd like to here @gnprice's thoughts; I'll mark for his review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The version in main
uses if (this.generation == generation)
here instead of our usual early return, and I think this lint rule was the main reason I went for that.
One extra layer of nesting is pretty tolerable, particularly as the indentation size is just 2 spaces.
Marking for Greg's review; Greg, please see in particular a question above: #1050 (comment) |
lib/model/message_list.dart
Outdated
@@ -528,9 +567,21 @@ class MessageListView with ChangeNotifier, _MessageSequence { | |||
|
|||
_insertAllMessages(0, fetchedMessages); | |||
_haveOldest = result.foundOldest; | |||
_fetchOlderCooldownBackoffMachine = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's put this line that ends a backoff series right next to the logic that starts a backoff — as an else
to the if (hasFetchError)
.
I believe that's equivalent to this way. And that way it's somewhat easier to see how this whole cooldown state machine operates.
lib/model/message_list.dart
Outdated
final startMarker = switch ((fetchingOlder, haveOldest, fetchOlderCoolingDown)) { | ||
(true, _, _) => const MessageListLoadingItem(MessageListDirection.older), | ||
(_, true, _) => const MessageListHistoryStartItem(), | ||
(_, _, true) => const MessageListLoadingItem(MessageListDirection.older), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the fact these two fetchOlder-related conditions produce the same result isn't a coincidence; so let's handle them together.
… Oh hmm but is the point that the order of these cases is significant? Is it possible to have haveOldest
and fetchOlderCoolingDown
both true?
If that is possible then I guess let's make that explicit. In the existing version in main, the assert(!(haveOldest && fetchingOlder))
is there precisely to reassure the reader that the order of the existing two cases doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in fact it's the case in this revision that haveOldest
and fetchOlderCoolingDown
can't both be true. Which is good — it seems like that'd be a confused state to be in.
So let's (a) assert that, and (b) combine the two fetchOlder-related cases here. Could have a line like
final effectiveFetchingOlder = fetchingOlder || fetchOlderCoolingDown;
and then the same switch
cases as in main.
@@ -1793,7 +1869,7 @@ void checkInvariants(MessageListView model) { | |||
if (model.haveOldest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apropos of my previous comment #1050 (comment) : there's a new invariant, so let's have this checkInvariants
function check that invariant.
_fetchingOlder = false; | ||
if (hasFetchError) { | ||
assert(!fetchOlderCoolingDown); | ||
_fetchOlderCoolingDown = true; | ||
unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine()) | ||
.wait().then((_) { | ||
if (this.generation != generation) return; | ||
_fetchOlderCoolingDown = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps in fact we can avoid introducing more flags in the first place. How about we fuse the new flag with the old one — or put another way, instead of adding a new flag, we adjust the semantics of the existing fetchingOlder
? Like this:
_fetchingOlder = false; | |
if (hasFetchError) { | |
assert(!fetchOlderCoolingDown); | |
_fetchOlderCoolingDown = true; | |
unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine()) | |
.wait().then((_) { | |
if (this.generation != generation) return; | |
_fetchOlderCoolingDown = false; | |
if (!hasFetchError) { | |
_fetchingOlder = false; | |
} else { | |
unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine()) | |
.wait().then((_) { | |
if (this.generation != generation) return; | |
_fetchingOlder = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to you both! Here's the rest of a full review.
test/model/message_list_test.dart
Outdated
checkNotified(count: 2); | ||
check(model).fetchOlderCoolingDown.isTrue(); | ||
|
||
// Drop irrelevant requests with without checking, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// Drop irrelevant requests with without checking, | |
// Drop irrelevant requests without checking, |
test/model/message_list_test.dart
Outdated
// Drop irrelevant requests with without checking, | ||
// as we are only interested in verifying if any request was sent. | ||
connection.takeRequests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and its comment are a bit puzzling. We're "interested in verifying if any request was sent" — but does this verify that? It looks like it just discards that information.
test/model/message_list_test.dart
Outdated
// The first backoff is expected to be short enough to complete. | ||
async.elapse(const Duration(seconds: 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The first backoff is expected to be short enough to complete. | |
async.elapse(const Duration(seconds: 1)); | |
// Wait long enough that a first backoff is sure to finish. | |
async.elapse(const Duration(seconds: 1)); |
"short enough to complete" sounded to me like saying it's possible to complete it. But the point here is that this specific line definitely does complete it.
test/model/message_list_test.dart
Outdated
check(connection.lastRequest).isNotNull(); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this is really the line that comment is most about.
It's fine to not inspect the details of the request. Let's include a check on model.messages.length
, though — that gives the reader of the test a nice quick legible confirmation that we ended up getting all the messages, and therefore must have made the usual request and handled the response in the usual way.
test/model/message_list_test.dart
Outdated
connection.prepare(httpStatus: 400, json: { | ||
'result': 'error', 'code': 'BAD_REQUEST', 'msg': 'Bad request'}); | ||
await check(model.fetchOlder()).throws<ZulipApiException>(); | ||
final backoffTimer = async.pendingTimers.single; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, clever!
test/model/message_list_test.dart
Outdated
// check that _reset was caleed | ||
check(model).fetched.isFalse(); | ||
check(model).fetchOlderCoolingDown.isFalse(); | ||
check(async.pendingTimers).contains(backoffTimer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check(async.pendingTimers).contains(backoffTimer); | |
check(backoffTimer.isActive()).isTrue(); |
test/model/message_list_test.dart
Outdated
async.elapse(const Duration(seconds: 1)); | ||
check(model).fetchOlderCoolingDown.isFalse(); | ||
check(async.pendingTimers).not((x) => x.contains(backoffTimer)); | ||
checkNotNotified(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing a test case, ideally we think through what scenarios would really stress the logic, and in particular would show clearly buggy symptoms if we had various plausible bugs.
So for example in the existing test just above this one, we let the fetch complete, and then check the model has no messages in its list:
await fetchFuture;
checkHasMessages([]);
checkNotNotified();
(and then call checkNotNotified
, mainly to ensure that checkInvariants
gets called). For that test case that's good because if the fetchOlder
call finished and then carried on, forgetting to check generation
, then the symptom would be that it puts the fetched messages into the list.
Here, the current revision of this test checks that fetchOlderCoolingDown
is still false. But that's not very informative, because although false is the value that flag had before this step, it's also exactly what the code under test would set the flag to if it did suffer from the race.
So to make a version of this test that really stresses the logic as it should, we should arrange a scenario where by the time the backoff completes, the flag has been set to true. Then we get to check here that it's still true — which confirms we don't have the bug where the code cheerfully sets it to false heedless of the new generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a new test, "fetchOlder backoff A starts, _reset, move fetch finishes, fetchOlder backoff B starts, fetchOlder backoff A ends", for this. This also needed a refactor to make it easier to test with BackoffMachine
, by offering a way to override the wait duration.
test/model/message_list_test.dart
Outdated
@@ -985,6 +1022,45 @@ void main() { | |||
checkNotifiedOnce(); | |||
})); | |||
|
|||
test('fetchOlder backoff start, _reset, fetchOlder backoff ends, move fetch finishes', () => awaitFakeAsync((async) async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's put this after both the existing "fetchOlder, _reset" test cases, instead of between the two of them. Those two are very similar and I feel like I want to read them together, comparing them.
(Filed #1126 for this.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! All looks good now except three small comments.
_fetchOlderCoolingDown = true; | ||
unawaited((_fetchOlderCooldownBackoffMachine ??= BackoffMachine()) | ||
.wait().then((_) { | ||
if (this.generation != generation) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if (this.generation != generation) return; | |
if (this.generation > generation) return; |
(matching the similar checks elsewhere)
await model.fetchOlder(); | ||
checkNotified(count: 2); | ||
check(connection.takeRequests()).single; | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump #1050 (comment)
// When `backoffTimerA` ends, `fetchOlderCoolingDown` remains `true` | ||
// because the backoff was from a previous generation. | ||
async.elapse(const Duration(seconds: 1)); | ||
check(model).fetchOlderCoolingDown.isTrue(); | ||
check(backoffTimerA.isActive).isFalse(); | ||
check(backoffTimerB.isActive).isTrue(); | ||
checkNotNotified(); | ||
|
||
// When `backoffTimerB` ends, `fetchOlderCoolingDown` gets reset. | ||
async.elapse(const Duration(seconds: 1)); | ||
check(model).fetchOlderCoolingDown.isFalse(); | ||
check(backoffTimerA.isActive).isFalse(); | ||
check(backoffTimerB.isActive).isFalse(); | ||
checkNotifiedOnce(); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new test looks good!
final duration = debugDuration ?? _maxDuration(const Duration(microseconds: 1), | ||
bound * Random().nextDouble()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: line too long (and in particular the value of microseconds
is past 80 columns, just)
Updated the PR. Thanks for the review! |
Thanks! Looks good — rebased, and will watch CI. |
This already holds for the existing callers. Updating end markers should only happen after the initial fetch. During the initial fetch, we have a separate loading indicator and no end markers. Signed-off-by: Zixuan James Li <[email protected]>
This will be used for testing. Signed-off-by: Zixuan James Li <[email protected]>
This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
This approach is different from how a BackoffMachine is typically used,
because the message list doesn't send and retry requests in a loop; its
caller retries rapidly on scroll changes, and we want to ignore the
excessive requests.
The test drops irrelevant requests with
connection.takeRequests
without checking, as we are only interested in verifying that no request
was sent.
Fixes: #945