Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backoff: reset on successful event poll #564

Closed
wants to merge 2 commits into from

Conversation

chrisirhc
Copy link
Contributor

@chrisirhc chrisirhc commented Mar 13, 2024

Fixes: #554

@chrisbobbe
Copy link
Collaborator

I've just edited the issue description to reference the issue it looks like this is about. Please correct it if it's wrong, but if it's right, please add a "Fixes" line to the relevant commit message too, once it's ready to be a not-draft. 🙂

@chrisirhc chrisirhc force-pushed the chua/reset-backoff branch from b954b8d to 246b7b3 Compare March 27, 2024 01:53
@chrisirhc
Copy link
Contributor Author

Ah yes, I created this but forgot about it.
I saw a different way of approaching this by reseting the reference to backoffMachine as null instead of maintaining this reset state over at:

BackoffMachine? backoffMachine;
while (true) {
try {
return await registerQueue(connection);
} catch (e) {
assert(debugLog('Error fetching initial snapshot: $e\n'
'Backing off, then will retry…'));
// TODO tell user if initial-fetch errors persist, or look non-transient
await (backoffMachine ??= BackoffMachine()).wait();
assert(debugLog('… Backoff wait complete, retrying initial fetch.'));
}
}

I could've gone down that approach as well, to keep the codebase consistent. Let me know if there's an opinion on this.

@chrisirhc chrisirhc marked this pull request as ready for review March 27, 2024 01:56
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisirhc for this PR! I'm back from vacation this week (and the rush of GSoC applicants is over), and I'm starting to catch up now.

One substantive comment below.

Also a commit-message nit: the summary line after the colon should be in sentence case. So e.g. "store: Reset backoff on success", rather than "store: reset …".

Comment on lines +63 to +64
final durationReset1 = await measureWait(backoffMachine.wait());
check(durationReset1).isLessThan(duration2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will flake — it'll fail about 25% of the time. For example:

$ time flutter test test/api/backoff_test.dart 
00:01 +1 -1: BackoffMachine.reset [E]                                            
  Expected: a Duration that:
    is less than <0:00:00.023591>
  Actual: <0:00:00.049613>
  Which: is not less than <0:00:00.023591>
  package:checks/src/checks.dart 85:9             check.<fn>
  package:checks/src/checks.dart 708:12           _TestContext.expect
  package:checks/src/extensions/core.dart 167:13  ComparableChecks.isLessThan
  test/api/backoff_test.dart 64:27                main.<fn>

The trouble is that the behavior of wait is randomized, and the range of possible durations for the second wait (or even the Nth wait for large N) overlaps with the range for the first wait.

That randomization is why the existing test for that method, above, runs 100 trials and then makes a fairly loose check on the results; the parameters are chosen, using some math that's in the comments, to ensure the flake rate is below one in a billion.

(I picked that threshold so that even in a large codebase, even with lots of randomized test cases using that threshold, the total flake rate due to randomization would be negligible compared to other sources of flakiness like races, other bugs, and infra issues.)

We totally could write a working test for this reset method by using the same strategy. But maybe the complexity of that is a good reason to skip the reset method and go with the other approach you mentioned at #564 (comment) , of just resetting a BackoffMachine? local to null.

@gnprice gnprice added the completion candidate PR with reviews that may unblock merging label May 14, 2024
@chrisbobbe
Copy link
Collaborator

Gentle bump to @chrisirhc to see if you're still interested in working on this. 🙂

@chrisirhc
Copy link
Contributor Author

I'll likely get back to this in a few weeks.
Based on the review comment, the change may end up being smaller/straightforward. So if anyone else wants to go ahead and do it before then, I won't be opposed. 🙂

@gnprice
Copy link
Member

gnprice commented Aug 10, 2024

We ended up implementing the approach from #564 (comment), as #871. Thanks again @chrisirhc!

@gnprice gnprice closed this Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completion candidate PR with reviews that may unblock merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reset backoff interval after successful connection
3 participants