Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better retry for base backup restore #631

Merged

Conversation

orange-kao
Copy link
Contributor

About this change - What it does

Current implementation use fixed retry (6) for base backup restore. However, on a large (2~4 TiB) database, there are more data to be transferred and will take longer. As a result, it is more likely to hit the limit.

This commit allows a dynamic number of retry, depends on the size of the base backup.

Why this way

This seems to be the minimum change to avoid starting over base backup download when restoring a large database. I acknowledge that there are other and/or better ways to implement this.

Copy link

@giacomo-alzetta-aiven giacomo-alzetta-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New mandatory parameter is missing in the tests. Either default it to STALL_MIN_RETRIES or update the tests.

@orange-kao orange-kao force-pushed the orange-retry-based-on-backup-size branch 3 times, most recently from dcbc4d6 to 0685c9c Compare October 28, 2024 23:42
Current implementation use fixed retry (6) for base backup restore.
However, on a large (2~4 TiB) database, there are more data to be
transferred and will take longer. As a result, it is more likely to hit
the limit.

This commit allows a dynamic number of retry, depends on the size of the
base backup.
@orange-kao orange-kao force-pushed the orange-retry-based-on-backup-size branch from 0685c9c to 4869d84 Compare October 29, 2024 03:59
@orange-kao
Copy link
Contributor Author

I have force-pushed to fix make lint fmt.

However, test failed on test/test_webserver.py::TestWebServer::test_requesting_basebackup[16] but I cannot reproduce it locally. I have force pushed again to restart the test.

While running tests locally test/test_walreceiver.py::TestWalReceiver::test_walreceiver[16-None] failed with AssertionError: Expected at least 4 xlog uploads, got 1 but I cannot trigger it again.

@orange-kao
Copy link
Contributor Author

Test passed, ready for review.

Copy link

@giacomo-alzetta-aiven giacomo-alzetta-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@RommelLayco RommelLayco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alexole alexole merged commit 5e22c85 into Aiven-Open:main Oct 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants