Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about 6 retries and not 3? #526

Merged
merged 3 commits into from
Oct 30, 2024
Merged

What about 6 retries and not 3? #526

merged 3 commits into from
Oct 30, 2024

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Oct 29, 2024

#524 There are some failures from ClearML erroring out.


This change is Reviewable

@johnml1135 johnml1135 requested a review from ddaspit October 29, 2024 21:53
@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.

Project coverage is 56.91%. Comparing base (b6b9ae4) to head (d992401).

Files with missing lines Patch % Lines
....Shared/Configuration/IMachineBuilderExtensions.cs 0.00% 19 Missing ⚠️
...Webhooks/Configuration/IServalBuilderExtensions.cs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #526      +/-   ##
==========================================
- Coverage   56.97%   56.91%   -0.06%     
==========================================
  Files         299      299              
  Lines       15686    15706      +20     
  Branches     2165     2168       +3     
==========================================
+ Hits         8937     8939       +2     
- Misses       6101     6119      +18     
  Partials      648      648              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

                b.WaitAndRetryAsync(
                    6,
                    retryAttempt => TimeSpan.FromSeconds(Math.Pow(1.5, retryAttempt)),

If we're getting hit with TooManyRequests as it is, will this really improve the issue? I understand the hope, but is this the best change? We've definitely seen fewer of these errors lately than we were before. Would maybe upping the timespan rather than upping the number of attempts be a good idea? @ddaspit, thoughts?

@ddaspit ddaspit requested a review from Enkidu93 October 30, 2024 17:56
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

If we're getting hit with TooManyRequests as it is, will this really improve the issue? I understand the hope, but is this the best change? We've definitely seen fewer of these errors lately than we were before. Would maybe upping the timespan rather than upping the number of attempts be a good idea? @ddaspit, thoughts?

I agree with Eli. If we are still encountering this issue occasionally, we probably want to tweak the polling timeout.

@johnml1135 johnml1135 force-pushed the add_polly_for_clearml branch from e9746ff to 9b5d2c9 Compare October 30, 2024 18:03
@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I agree with Eli. If we are still encountering this issue occasionally, we probably want to tweak the polling timeout.

I have seen errors on E2E tests - It caused 10 "TooManyRequests" errors on QA in the past day.
I reverted to the first few having the same timing and upped the overall timing to 56 seconds.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I have seen errors on E2E tests - It caused 10 "TooManyRequests" errors on QA in the past day.
I reverted to the first few having the same timing and upped the overall timing to 56 seconds.

@johnml1135, I was think more like increasing the rate of backoff. What about 3, 9, 27 (sum 39)? Or 2.25^1, 2.25^2, 2.25^3, 2.25^4 (still under 1 minute)?


src/Serval/src/Serval.Webhooks/Configuration/IServalBuilderExtensions.cs line 9 at r2 (raw file):

        builder
            .Services.AddHttpClient<WebhookJob>()
            // Add retry policy; fail after approx. 1.5 + 2.25 ... 1.5^6 ~= 31 seconds

Is this meant to be here?

@johnml1135
Copy link
Collaborator Author

src/Serval/src/Serval.Webhooks/Configuration/IServalBuilderExtensions.cs line 9 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Is this meant to be here?

Should have been removed before.

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

@johnml1135, I was think more like increasing the rate of backoff. What about 3, 9, 27 (sum 39)? Or 2.25^1, 2.25^2, 2.25^3, 2.25^4 (still under 1 minute)?

For high workload, a linear back off is often recommended. Updated to linear back off.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

For high workload, a linear back off is often recommended. Updated to linear back off.

Ok

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Configuration/IMachineBuilderExtensions.cs line 165 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Ok

Done.

@johnml1135 johnml1135 merged commit 22f612d into main Oct 30, 2024
2 of 3 checks passed
@johnml1135 johnml1135 deleted the add_polly_for_clearml branch October 30, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants