Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred retry of hardware_info handling when device not found #317

Merged
merged 7 commits into from
May 16, 2024

Conversation

timcowlishaw
Copy link
Contributor

@timcowlishaw timcowlishaw commented Apr 22, 2024

Since we deployed the devices refactor, a slightly subtle bug (#314) has emerged: if a device sends the hardware_info packet before the user has completed registration, the device is not found, the info is not saved, and therefore the device information in the UI is incomplete. This (hopefully, I like to test on staging with a real device) fixes this by relying on Sidekiq's built in error handling functionality to retry the hardware_info message (with backoff) in the event that no device corresponds to the given key. Sidekiq defaults to 25 retries with backoff (corresponding to about 3 weeks of elapsed time between the first and the last try), which should be sufficient: https://docs.gitlab.com/ee/development/sidekiq/.

@timcowlishaw timcowlishaw requested a review from oscgonfer April 22, 2024 17:51
Since we deployed the devices refactor, a slightly subtle bug has
emerged: if a device sends the hardware_info packet before the user has
completed registration, the device is not found, the info is not saved,
and therefore the device information in the UI is incomplete. This
(hopefully, I like to test on staging with a real device) fixes this by
relying on Sidekiq's built in error handling functionality to retry the
hardware_info message (with backoff) in the event that no device
corresponds to the given key. Sidekiq defaults to 25 retries with
backoff (corresponding to about 3 weeks of elapsed time between the
first and the last try), which should be sufficient:
https://docs.gitlab.com/ee/development/sidekiq/.
@oscgonfer
Copy link
Contributor

oscgonfer commented Apr 26, 2024

  • Retry not only on hw_info, but also on data payloads, BUT, let's keep an eye on the queue size
  • Check if we have leverage on the retry interval (30s - 10 times, 60s - 10 times...)

@timcowlishaw timcowlishaw marked this pull request as ready for review May 6, 2024 04:52
@timcowlishaw
Copy link
Contributor Author

@oscgonfer changes above addressed, I think this probably needs another test on staging (we can look tomorrow, then it should be ready to go!)

@timcowlishaw
Copy link
Contributor Author

I've tweaked the retry logic to make sure the timeout works with ActiveJob: we'll need to test on staging on thursday though. I haven't deployed for now for a couple of reasons:

  1. There are real users relying on the staging forwarding mechanism, and i can't deploy without making a breaking API change to that (the auth stuff we discussed)

  2. The queue is full of retries from the bridge (as we suspected would happen) and i think we should clear the queue before deploying with the new logic to make sure they don't all retry at once!

@timcowlishaw timcowlishaw merged commit 049b241 into master May 16, 2024
2 checks passed
@timcowlishaw timcowlishaw deleted the bugfix/reparse_hardware_info_on_null_device branch May 16, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants