Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor progress based basebackup metrics #622

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

sebinsunny
Copy link
Contributor

This PR refactors the basebackups monitoring introduced in PR #615. Previously, we reset the basebackup progress file whenever a new basebackup request was made, which resulted in not catching a few cases where pghoard restarts. Now, the progress file is only reset when a backup is successful, and we also record the total bytes uploaded in the file for the previous basebackup. If there is a retry due to a pghoard restart or a failed backup request, we check if progress has been made; if it has not exceeded the bytes uploaded in the previous state, we emit a stalled metric. Also, added logging for upload progress for each file and snapshot stages in a basebackup operation.

[SRE-7476]

About this change - What it does

Resolves: #xxxxx

Why this way

@sebinsunny sebinsunny force-pushed the sebinsunny-refactor-pg-basebackup-metric branch 3 times, most recently from b9cf496 to 75a935c Compare May 31, 2024 05:27
pghoard/transfer.py Outdated Show resolved Hide resolved
@sebinsunny sebinsunny force-pushed the sebinsunny-refactor-pg-basebackup-metric branch 2 times, most recently from 2ec6393 to 7eb1fda Compare June 5, 2024 02:13
@sebinsunny sebinsunny requested a review from facetoe June 5, 2024 13:44
@sebinsunny sebinsunny force-pushed the sebinsunny-refactor-pg-basebackup-metric branch from 7eb1fda to 56098ae Compare June 7, 2024 01:30
@codecov-commenter
Copy link

codecov-commenter commented Jun 7, 2024

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.

Project coverage is 90.80%. Comparing base (5505b86) to head (56098ae).
Report is 11 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #622      +/-   ##
==========================================
- Coverage   91.01%   90.80%   -0.21%     
==========================================
  Files          31       31              
  Lines        4917     4968      +51     
==========================================
+ Hits         4475     4511      +36     
- Misses        442      457      +15     
Files Coverage Δ
pghoard/basebackup/base.py 92.25% <100.00%> (ø)
pghoard/basebackup/delta.py 90.87% <0.00%> (-0.34%) ⬇️
pghoard/transfer.py 94.51% <87.50%> (-1.28%) ⬇️

... and 4 files with indirect coverage changes

…reviously, we reset the basebackup progress file whenever a new basebackup request was made, which resulted in not catching a few cases where pghoard restarts. Now, the progress file is only reset when a backup is successful, and we also record the total bytes uploaded in the file for the previous basebackup. If there is a retry due to a pghoard restart or a failed backup request, we check if progress has been made; if it has not exceeded the bytes uploaded in the previous state, we emit a stalled metric. Also, added logging for upload progress for each file and snapshot stages in a basebackup operation.

[SRE-7476]
@sebinsunny sebinsunny force-pushed the sebinsunny-refactor-pg-basebackup-metric branch from 56098ae to 649d80e Compare June 7, 2024 03:18
@facetoe facetoe merged commit d28945a into main Jun 7, 2024
7 checks passed
@facetoe facetoe deleted the sebinsunny-refactor-pg-basebackup-metric branch June 7, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants