Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry data is being lost #3018

Open
mistermoe opened this issue Oct 7, 2024 · 2 comments
Open

telemetry data is being lost #3018

mistermoe opened this issue Oct 7, 2024 · 2 comments
Assignees

Comments

@mistermoe
Copy link
Collaborator

Repro

  1. run just otel-stream
  2. run just otel-dev
  3. Send 2-3 calls to echo.Echo
  4. ctrl+c before flush interval (5s)
  5. notice missing metrics

Potential Fix

AFAICT, Shutdown isn't being called for the otel exporters or any of the providers in the observability client. This can lead to collected telemetry data getting dropped on the ground because it doesn't get flushed before shutting down.

Each otel provider and exporter has a Shutdown method (e.g. docs for metrics exporter's shutdown). We just need to call Shutdown on all of them. might make sense to surface an observability.Shutdown method that takes care of calling Shutdown on all of the underlying exporters and providers.

I'm guessing we have to call Shutdown on all of them vs. just the Exporter, because from what i can tell, the providers flush to the exporter's internal cache and the exporter flushes to wherever it's been configured to export to

@github-actions github-actions bot added the triage Issue needs triaging label Oct 7, 2024
@ftl-robot ftl-robot mentioned this issue Oct 7, 2024
@wesbillman
Copy link
Collaborator

This makes me think it might be nice to have some Shutdown functions in controller and runner to allow us to clean up/gracefully shutdown. I'm guessing these could be implemented as counterparts to the Start functions we have for each now. @alecthomas is there a typical way this should be handled?

@safeer
Copy link
Contributor

safeer commented Oct 7, 2024

Generalize to context cancellation on shutdown. Use the waitgroup in the controller?

@safeer safeer added next Work that will be be picked up next and removed triage Issue needs triaging labels Oct 7, 2024
@github-actions github-actions bot removed the next Work that will be be picked up next label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants