Clarify how export retry should be implemented #4138

brettmc · 2024-07-10T10:35:18Z

What are you trying to achieve?

I am trying to define exporter retry config in file-based configuration: open-telemetry/opentelemetry-configuration#97

The spec says that the client SHOULD implement an exponential backoff strategy between retries, but doesn't say anything further.

The consequence of this is that multiple SIGs have implemented an exponential backoff retry strategy, in numerous ways, and using a variety of inputs. Ultimately, I think it means we cannot have a common way to configure the retry strategy of an exporter, because we don't have a common set of inputs.

This is particularly problematic for automatic configuration via file-based configuration, which is trying to unify and extend SDK configuration beyond what is possible with environment variables. It's also problematic for environment variable-based configuration, although less so because SIGs are able to implement their own variables using the language-specific OTEL_<LANG>_FEATURE.

What I would like to see is that we decide whether a retry strategy should be configurable, and if so work out if it's possible to agree a common set of parameters to control the strategy, so that we can expose to users a mechanism to control the behaviour.

Alternatively, since we do have [https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#timeout-configuration](timeout configuration), do we make that (ie, the total time spent) the only input variable, and allow language implementations to choose their own, non-configurable, implementation provided they observe that one deadline?

Related issues:

Additional context.

Some examples of the inputs to retry strategy from different SIGs:

Go:

Enabled bool
InitialInterval time.Duration
MaxInterval time.Duration
MaxElapsedTime time.Duration

PHP:

initial_delay
max_retries

Python:

max_value (for an exponentially increasing delay time)

Java:

maxAttempts
initialBackoff
maxBackoff
backoffMultiplier

Java mentions specifically in docs that there is "currently no way to customize [the retry policy]": https://github.com/open-telemetry/opentelemetry-java/blob/9543a3451851d82f2f9d8e1b0cd78e2cc133b1a5/sdk-extensions/autoconfigure/README.md#otlp-exporter-retry

The text was updated successfully, but these errors were encountered:

carlosalberto · 2024-07-19T16:35:34Z

IIRC we left that like that to allow SIGs to implement whatever made sense in the language itself, but we should definitely try to (at least) standardize a few params, such as maxBackoff and enabled (or strategy=exponential_retry|none, to support future strategies?)

brettmc · 2024-07-20T04:18:03Z

I really like the idea of a retry policy being an interface, so that it can be expanded in future with either official or contrib/3rd party policies. Perhaps spec begins with both exponential_retry and none implementations of a retry policy? I feel that having a none policy implementation is a cleaner interface than each policy needing to implement a disabled?

jack-berg · 2024-08-07T15:55:53Z

Also related to #1742

jack-berg · 2024-08-14T18:46:47Z

We discussed in the 8/7/14 and 8/14/24 TC meetings. I wrote a document summarizing a number of somewhat overlapping OTLP retry issues, and sketching out some proposals on how to fix them.

There was consensus on a number of things:

we should fix the consistency / contradictions in Clarify and improve the OTLP exporter requirements of a retry strategy #3639.
it would be good to make the set of retryable status codes configurable ([exporter/otlphttp] 500 should be retryable or not #3314, Retryable HTTP Statuses should be configurable in OTLP clients #3876)
it would be good to accommodate the users who require retry w/ persistent storage (Allow Optional Retries Using Persistent Store for OTLP Exporter #3645), although doing so is not trivial and needs sponsorship

However, on this specific issue, its not clear that the diverging implementations of the retry backoff algorithm are enough of a problem to justify the time investment needed to standardize. Most users are probably fine with the default backoff algorithm and won't touch whatever configuration options each implementation provides. Adding "community feedback" label to this issue to solicit more info about whether the current situation is problematic enough to address.

brettmc added the spec:trace Related to the specification/trace directory label Jul 10, 2024

danielgblanco added the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Jul 15, 2024

jack-berg mentioned this issue Jul 19, 2024

add exporter retry configuration open-telemetry/opentelemetry-configuration#97

Open

brettmc mentioned this issue Jul 20, 2024

Add explicit option for SDK otlp exporter to disable retry #4148

Closed

jack-berg added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted and removed triage:deciding:tc-inbox Needs attention from the TC in order to move forward labels Aug 14, 2024

This was referenced Aug 14, 2024

Retryable HTTP Statuses should be configurable in OTLP clients #3876

Open

Clarify and improve the OTLP exporter requirements of a retry strategy #3639

Open

Clarification on exporter timeout config #2346

Open

github-actions bot added the triage:followup Needs follow up during triage label Sep 19, 2024

mtwo removed the triage:followup Needs follow up during triage label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify how export retry should be implemented #4138

Clarify how export retry should be implemented #4138

brettmc commented Jul 10, 2024 •

edited

Loading

carlosalberto commented Jul 19, 2024

brettmc commented Jul 20, 2024

jack-berg commented Aug 7, 2024

jack-berg commented Aug 14, 2024

Clarify how export retry should be implemented #4138

Clarify how export retry should be implemented #4138

Comments

brettmc commented Jul 10, 2024 • edited Loading

carlosalberto commented Jul 19, 2024

brettmc commented Jul 20, 2024

jack-berg commented Aug 7, 2024

jack-berg commented Aug 14, 2024

brettmc commented Jul 10, 2024 •

edited

Loading