Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel: change the go-runtime SDK to use a constant string (or configurable?) #2026

Open
1 of 2 tasks
deniseli opened this issue Jul 9, 2024 · 5 comments
Open
1 of 2 tasks
Assignees

Comments

@deniseli
Copy link
Contributor

deniseli commented Jul 9, 2024

THIS IS LOOOOWWWW PRIORITY! (because it's both not in use and opt-in)

In go-runtime/ftl/observability/metrics.go, we use the verb name as the metric name, which would result in a seeing a quite high number of metrics. This is not ideal; we don't want FTL to have a reputation for being expensive to instrument logging for.

As an alternative, users are free to invoke the otel SDK directly in their code, and the global otel configuration/setup we've done will apply.

  • Delete the file for now
  • Redesign it later
@github-actions github-actions bot added the triage Issue needs triaging label Jul 9, 2024
@deniseli deniseli changed the title otel: Change the go-runtime SDK to use a constant string (or configurable?) otel: change the go-runtime SDK to use a constant string (or configurable?) Jul 9, 2024
@ftl-robot ftl-robot mentioned this issue Jul 9, 2024
@deniseli deniseli mentioned this issue Jul 9, 2024
15 tasks
@gak gak added next Work that will be be picked up next and removed triage Issue needs triaging labels Jul 11, 2024
@wesbillman
Copy link
Collaborator

@deniseli maybe we should just remove this file for now since it's not being used. We can re-introduce something we're more happy with at that time? I think we're also using the verb name for meter vs. a counter/etc. metric so we're probably ok there. anyway.

@github-actions github-actions bot removed the next Work that will be be picked up next label Jul 26, 2024
@deniseli
Copy link
Contributor Author

@wesbillman yeah agreed, getting rid of the file for now seems like the right call.

deniseli added a commit that referenced this issue Jul 26, 2024
Part 1 of #2026

This deletes the whole observability package from the Go SDK, since no
one is using it yet. We'll replace it with an interface that better
aligns with our internal otel usage patterns once that's more mature.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@deniseli
Copy link
Contributor Author

deniseli commented Aug 7, 2024

Current thoughts: it'd be nice to make the go-runtime SDK observability interface more declarative than what otel forces you into. We can simply wrap otel for now and give the users better functions for interfacing with it. For any niche features that we can't support, there's always the backdoor of invoking otel directly in the user code.

Design Thoughts Part 1: Attributes

It'd be nice to pass attributes via struct (so the types are built-in) instead directly using the attribute package that otel provides. Example:

Suppose you're a user, and one of your verbs calls out to a somewhat flaky external API. You want to instrument some metrics on your verb with attributes recording the details of that API call:

type MyVerbAttrs struct {
    failureMode ftl.Option[string] `attr:"failure_mode"`
    retryCount  int                `attr:"my_api_call.retry_count"`
}

func(...) (...) {
    ...
    ftl.RecordCount(ctx, ftl.Metric(...), MyVerbAttrs{...})
    ...
}

The Go SDK could use reflect to produce:

attrs := []attribute.KeyValue{
    attribute.String("ftl.user_defined.failure_mode", "my_failure_mode_1"), // only present when option.ok is true
    attribute.Int64("ftl.user_defined.my_api_call.retry_count.bucket", 0),
}

@deniseli
Copy link
Contributor Author

deniseli commented Aug 7, 2024

Design Thoughts Part 2: Metric instantiation

Meters and metrics don't have to be instantiated at initialization time. In fact, the user code will be cleaner if they don't need to think about that. The SDK could just provide a set of functions to create and manage metrics:

ftl.CounterMetric[T](signalName string, unit, description ftl.Option[string]) (metric.Int64Counter, error)

  • The first time it's called with a particular signalName, call meter.Int64Counter(...) (or whatever appropriate metric type for T).
    • If err == nil, then store that counter (i.e. the metric) in a package-level registry keyed by the signal name.
    • If err != nil, then store the error in another package-level registry keyed by the signal name. return the error.
  • On subsequent calls, look up the metric by name in the registries.
    • If the metric was created with different parameters (e.g. T, unit, or description differ from the registered value), then return an inconsistency error.
    • Otherwise, if the metric is found in the metric registry, then return that with a nil error.
    • Otherwise, if the metric is found in the error registry, then return that error again.

To support all the main signal types (Counter, Histogram, and Gauge), we'll need to flesh out this interface with some more functions. Options:

  • Set up interface wrappers to support a single ftl.Metric[T, U](...) function where T is the data type (e.g. int) and U is the counter type (e.g. counter, histo, gauge)
  • Set up simpler interface wrappers to support separate functions for each counter type: e.g. ftl.CounterMetric[T](...)
  • Thin wrappers around each supported metric type without generics: e.g. ftl.IntCounterMetric(...)

Note: Leaving UpDownCounter out for now because if any up or down update is missed, then the observed value would be over or under, which could lead to confusion. Safer to funnel users into using Histo or Gauge.

The meter itself could be instantiated and held as a singleton at the package scope.

@deniseli
Copy link
Contributor Author

deniseli commented Aug 8, 2024

note from standup: this should also build in constraints on attribute cardinality since we get charged for that by datadog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants