Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rommellayco make stale seconds configurable #630

Conversation

RommelLayco
Copy link
Contributor

@RommelLayco RommelLayco commented Oct 21, 2024

About this change - What it does

Implement a exponential backoff when stalling on backup download

Why this way

@RommelLayco RommelLayco marked this pull request as draft October 21, 2024 10:17
@RommelLayco RommelLayco force-pushed the RommelLayco-make-stale-seconds-configurable branch from d44ec96 to 0c0e98f Compare October 21, 2024 10:20
@RommelLayco RommelLayco marked this pull request as ready for review October 21, 2024 10:20
@RommelLayco RommelLayco requested review from tkren and facetoe October 21, 2024 10:20
@RommelLayco RommelLayco force-pushed the RommelLayco-make-stale-seconds-configurable branch 2 times, most recently from 80dc11b to f9b5498 Compare October 21, 2024 13:49
@RommelLayco RommelLayco requested review from rikonen and a team October 21, 2024 13:49
@RommelLayco RommelLayco force-pushed the RommelLayco-make-stale-seconds-configurable branch 3 times, most recently from e570f33 to 5775163 Compare October 28, 2024 16:53
Copy link

@ezotrank ezotrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but added one tiny comment

pghoard/restore.py Outdated Show resolved Hide resolved
@tkren
Copy link
Contributor

tkren commented Oct 29, 2024

There is Aiven-Open/rohmu#194 and #631 which I think probably already solve most of our problems.

@RommelLayco
Copy link
Contributor Author

closing cause other PR solve this problem better

@RommelLayco
Copy link
Contributor Author

i gonna re open cause we could still get timeout errors and it is better to retry with longer timeouts

@RommelLayco RommelLayco reopened this Oct 29, 2024
@orange-kao
Copy link
Contributor

The conflict is caused by the merge of #631. After fixing conflicts, I have tested this in dev env and LGTM.

self.max_stale_seconds can double as many times as the self.stall_max_retries allows, and it can become quite large. I assume that if us (or other pghoard users) use the restore progress for monitoring and alerting purpose, it should not be a problem?

@RommelLayco RommelLayco force-pushed the RommelLayco-make-stale-seconds-configurable branch 2 times, most recently from 743c14b to ecef14b Compare November 4, 2024 14:26
@orange-kao
Copy link
Contributor

Thanks for fixing the conflict. I assume it should be self.max_stale_seconds = min(self.max_stale_seconds * 2, 480) ?
(min, not max. Otherwise self.max_stale_seconds will jump from 120 to 480, and then 960...)

When timing out downloading a backup, on next retry, retry with a longer
timeout.
@RommelLayco RommelLayco force-pushed the RommelLayco-make-stale-seconds-configurable branch from ecef14b to 8aae740 Compare November 8, 2024 09:53
Copy link
Contributor

@orange-kao orange-kao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have test this in dev env and it works as expected. Thanks Rommel.

@alexole alexole merged commit 91a70a3 into Aiven-Open:main Nov 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants