Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit Compliance GDI specification v1.6.0: Configuration #978

Closed
pellared opened this issue Nov 22, 2024 · 10 comments
Closed

Audit Compliance GDI specification v1.6.0: Configuration #978

pellared opened this issue Nov 22, 2024 · 10 comments
Assignees

Comments

@pellared
Copy link
Contributor

pellared commented Nov 22, 2024

Review compliance against https://github.com/signalfx/gdi-specification/blob/v1.6.0/specification/configuration.md

@pellared
Copy link
Contributor Author

pellared commented Nov 22, 2024

One or more configuration variables may be needed to properly configure GDI repositories. Components that can be configured with environment variables MUST support configuration of these variables using environment variables.

✅ Signals can be configured via env vars (https://github.com/signalfx/splunk-otel-js/blob/main/docs/advanced-config.md)

Any component that cannot be configured with environment variables MUST support configuration of these variables using an alternate method.

✅ We have programmatic options which take precedence over env vars.

Any component MAY support configuration of these variables by additional methods.

✅ We have programmatic options.

GDI repositories MUST adopt stable and SHOULD adopt experimental configuration
variables in the OpenTelemetry Specification before proposing variables to the GDI specification.

https://github.com/signalfx/splunk-otel-js/blob/main/docs/advanced-config.md (imho I don't think anyone really cares whether we deem these experimental or not)

If a new configuration variable is needed by a GDI repository it MUST be brought to the GDI specification as a GitHub issue. The GDI specification maintainers SHOULD consider introducing needed configuration variables to the OpenTelemetry repository before approving Splunk-specific configuration variables.

❌ We have a few env vars that are JS specific and would pollute the GDI spec (e.g. SPLUNK_NEXTJS_FIX_ENABLED)

If a GDI repository requires an immediate configuration variable that is not available in the OpenTelemetry specification and not available in the GDI specification, the repository MAY introduce a repository-specific configuration variable until the GDI specification maintainers make a decision. Any repository-specific configuration variable defined SHOULD be prefixed with SPLUNK_ and MUST NOT be marked as stable unless added to the GDI specification. Upon resolution by the GDI specification, the GDI repository MUST change the repository-specific configuration variable by the repository's next minor release. This change MAY result in a breaking change so caution should be exhibited when considering repository-specific configuration variables.

✅ Our extra env vars are prefixed with SPLUNK_.

Splunk-specific configuration variables defined in the GDI specification MUST be prefixed with SPLUNK_. Furthermore, configuration specific to Splunk Observability Cloud MUST be prefixed with SPLUNK_OBSERVABILITY_ and to Splunk Enterprise or Splunk Cloud MUST be prefixed with SPLUNK_PLATFORM_.

✅ Our extra env vars are prefixed with SPLUNK_.

If a Splunk-specific configuration variable is declared as stable in the GDI specification and later the OpenTelemetry specification declares a similar variable as stable, the GDI specification MUST adopt the OpenTelemetry configuration variable and SHOULD mark the GDI specification configuration variable as deprecated by the next minor release. In addition to defining
Splunk-specification configuration variables, the GDI specification MAY require specific OpenTelemetry configuration variables be supported. If it does, the GDI specification MAY require certain values be supported including a specific default value.

Whenever a configuration variable changes its name, a stable GDI repository (version >= 1.0) MUST support both old and new names until the next major release is done. The old configuration variable MUST NOT be removed in a minor release. GDI repositories that are not yet stable (version < 1.0) SHOULD follow this rule, but they are not required to. When it is detected that a user uses the old configuration variable a warning SHOULD be logged: the warning SHOULD state that the old variable is deprecated, the new one should be used instead, and that the old one will be removed in the next major release (not stable GDI repositories MAY remove deprecated features in any future release).

✅ We are dropping SPLUNK_METRICS_ENDPOINT in this major release.

Installation and configuration MUST optimize for customer experience and time-to-value. Installation and configuration MAY provide a mechanism for advanced or custom settings. As an example, the default installation for instrumentation libraries should provide everything needed to configure W3C and B3 as well as OTLP and Jaeger Thrift even though the default configuration is set to W3C and OTLP. Installing all of these dependencies by default may result in a large package, but makes it easy for users to switch settings via configuration. An advanced installation process can be provided where the user chooses what to install limiting the configuration options.

✅ JS distro provides customization either via programmatic config or via env vars.

For all use-cases that support environment variables (e.g. applications and serverless), it MUST be possible to configure an Instrumentation Library instance using the following environment variables:

Name Default Description
SPLUNK_ACCESS_TOKEN Access token added to exported data. [1]
SPLUNK_PROFILER_CALL_STACK_INTERVAL 10000 Interval at which call stacks are sampled (in ms) [5]
SPLUNK_PROFILER_ENABLED false Whether CPU profiling is enabled. [2] [5]
SPLUNK_PROFILER_LOGS_ENDPOINT * Where profiling data is sent. Defaults to the value in OTLP_EXPORTER_OTLP_ENDPOINT [5]
SPLUNK_PROFILER_MEMORY_ENABLED false Whether memory profiling is enabled. [2] [6]
SPLUNK_REALM none Which realm to send exported data. [3]
SPLUNK_TRACE_RESPONSE_HEADER_ENABLED true Whether Server-Timing header is added to HTTP responses. [4]
  • [1]: Not user required if another system performs the authentication. For example, instrumentation libraries SHOULD send data to a locally running agent. The agent MAY define the access token that is used. If the instrumentation library is configured to send data directly to a Splunk back-end then this variable MUST be defined. Even when sending to an agent, instrumentation libraries MAY define this option (to support access_token_passthrough). This environment variable MUST work for otlp and jaeger-thrift-splunk exporters.
  • [2]: The instrumentation library SHOULD NOT allow changing the setting at runtime, and the initial setting SHOULD be used for the entire lifespan of the application run. An instrumentation library whose profiling capability is deactivated MUST NOT introduce additional profiling-based overhead. It also MUST NOT emit profiling-based data.
  • [3]: By default, instrumentation libraries are configured to send to a local collector (see OTEL_TRACES_EXPORTER below). If SPLUNK_REALM is set to anything besides none then the OTEL_EXPORTER_*_ENDPOINT is set to an endpoint based on the defined realm. If both SPLUNK_REALM and OTEL_EXPORTER_*_ENDPOINT are set then OTEL_EXPORTER_*_ENDPOINT takes precedence.
  • [4]: If stitching of RUM spans and APM spans is desired then this parameter MUST be set to true.
  • [5]: Applies only to instrumentation libraries with CPU profiling capabilities.
  • [6]: Applies only to instrumentation libraries with memory profiling capabilities.

In addition to Splunk-specific environment variables, the following [OpenTelemetry environment
variables](https://github.com/open-telemetry/opentelemetry specification/blob/f228a83e652e5cd3ba96b9f780b704ee7a7daa4c/specification/sdk-environment-variables.md) are required.

  • OTEL_SERVICE_NAME
    • Users MUST define a name for the service they are instrumenting. The service name can either be defined using the OTEL_SERVICE_NAME or OTEL_RESOURCE_ATTRIBUTES environment variable, or directly in their code. If the user fails to define a service name the distribution MUST log a warning. The warning message MUST clearly describe how to set the service name or link to relevant documentation. E.g.

      The service.name resource attribute is not set. Your service is unnamed and will be difficult to identify.
      Set your service name using the OTEL_SERVICE_NAME or OTEL_RESOURCE_ATTRIBUTES environment variable.
      E.g. `OTEL_SERVICE_NAME="<YOUR_SERVICE_NAME_HERE>"`

https://github.com/signalfx/splunk-otel-js/blob/main/src/start.ts#L107

  • OTEL_PROPAGATORS
    • Distribution MUST default to "tracecontext,baggage"
    • Distribution MUST support and document how to switch to b3multi

https://github.com/signalfx/splunk-otel-js/blob/main/src/tracing/options.ts#L282

  • Span Collection Limits
    • Distribution MUST default to 1000 for OTEL_SPAN_LINK_COUNT_LIMIT (not OpenTelemetry default)
    • Distribution MUST default to 12000 for OTEL_ATTRIBUTE_VALUE_LENGTH_LIMIT (not OpenTelemetry default)
    • Distribution MUST default to unlimited for all others (not OpenTelemetry default)

https://github.com/signalfx/splunk-otel-js/blob/main/src/tracing/options.ts#L53

  • OTEL_TRACES_EXPORTER
    • Non-RUM distribution MUST default to otlp using grpc or http/protobuf transport protocol.

    • Non-RUM distribution MAY offer jaeger-thrift-splunk that defaults to http://127.0.0.1:9080/v1/trace.
      NOTE: jaeger-thrift-splunk is deprecated. If the user selects jaeger-thrift-splunk, distributions MUST log a deprecation warning and suggest an alternate method. For example:

      jaeger-thrift-splunk trace exporter is deprecated and may be removed in a future major release. Use the default 
      OTLP exporter instead, or set the SPLUNK_REALM and SPLUNK_ACCESS_TOKEN environment variables to send 
      telemetry directly to Splunk Observability Cloud.

https://github.com/signalfx/splunk-otel-js/blob/main/src/tracing/options.ts#L215

In addition to environment variables, other ways of defining configuration also exist:

  • Java System Properties: These properties MUST match the environment variables converting to lower case and replacing underscores with hyphens or periods. For example: system property splunk.trace-response-header.enabled is equivalent to environment variable SPLUNK_TRACE_RESPONSE_HEADER_ENABLED.

✅ not applicable

@pellared
Copy link
Contributor Author

pellared commented Nov 22, 2024

❌ We have a few env vars that are JS specific and would pollute the GDI spec (e.g. SPLUNK_NEXTJS_FIX_ENABLED)

What do you think of following a rule similar to https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#language-specific-environment-variables:

Repository-specific environment variables are formed using the following convention: SPLUNK_{COMPONENT}_{FEATURE}

I think if we have such pattern then it would be easy to distinguish repo-specific features for more general features. We could also simplify the process defined in https://github.com/signalfx/gdi-specification/blob/main/specification/configuration.md to not require any actions in GDI spec for SPLUNK_{COMPONENT}_{FEATURE} env vars. E.g. SPLUNK_NEXTJS_FIX_ENABLED could be renamed to SPLUNK_JS_NEXTJS_FIX_ENABLED.

@seemk, thoughts?

@seemk
Copy link
Contributor

seemk commented Nov 25, 2024

Hmm, I don't really want to change the existing ones, plus as an end user when you're configuring a JS service, I don't think the JS part adds much

@pellared
Copy link
Contributor Author

pellared commented Nov 25, 2024

Hmm, I don't really want to change the existing ones, plus as an end user when you're configuring a JS service, I don't think the JS part adds much

OK. I also see that Java also does more or less the same: https://github.com/signalfx/splunk-otel-java/blob/main/docs/advanced-config.md. I think we should loosen the requirement below:

If a new configuration variable is needed by a GDI repository it MUST be brought to the GDI specification as a GitHub issue.

EDIT: I created signalfx/gdi-specification#326

@pellared
Copy link
Contributor Author

pellared commented Nov 25, 2024

❌ We have a few env vars that are JS specific and would pollute the GDI spec (e.g. SPLUNK_NEXTJS_FIX_ENABLED)

I decided to reopen.

There is a lot of not compliant env vars documented here : https://github.com/signalfx/splunk-otel-js/blob/main/docs/advanced-config.md. There are also more (not documented) here: https://github.com/signalfx/splunk-otel-js/blob/main/src/utils.ts

There is also an issue signalfx/gdi-specification#326 for handling repo-specific env vars.

We should look if we need env vars like SPLUNK_RUNTIME_METRICS_ENABLED or not have them at all (e.g. why one would like to disable the runtime metrics?). Moreover, for instance instead of introducing SPLUNK_TRACING_ENABLED the user could simply OTEL_TRACES_EXPORTER=none.

The most worrying is the SPLUNK_TRACE_RESPONSE_HEADER_ENABLED SPLUNK_REDIS_INCLUDE_COMMAND_ARGS which is defined as Stable.

@pellared pellared reopened this Nov 25, 2024
@seemk
Copy link
Contributor

seemk commented Nov 26, 2024

❌ We have a few env vars that are JS specific and would pollute the GDI spec (e.g. SPLUNK_NEXTJS_FIX_ENABLED)

I decided to reopen.

There is a lot of not compliant env vars documented here : https://github.com/signalfx/splunk-otel-js/blob/main/docs/advanced-config.md. There are also more (not documented) here: https://github.com/signalfx/splunk-otel-js/blob/main/src/utils.ts

There is also an issue signalfx/gdi-specification#326 for handling repo-specific env vars.

We should look if we need env vars like SPLUNK_RUNTIME_METRICS_ENABLED or not have them at all (e.g. why one would like to disable the runtime metrics?). Moreover, for instance instead of introducing SPLUNK_TRACING_ENABLED the user could simply OTEL_TRACES_EXPORTER=none.

The most worrying is the SPLUNK_TRACE_RESPONSE_HEADER_ENABLED which is defined as Stable.

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED exists in the spec: https://github.com/signalfx/gdi-specification/blob/main/specification/configuration.md#instrumentation-libraries

@seemk
Copy link
Contributor

seemk commented Nov 26, 2024

OTEL_TRACES_EXPORTER=none only means that nothing gets exported (but metrics are still collected, libraries are instrumented), however disabling tracing (for debugging or perf comparisons) means that the code is not monkeypatched.

@pellared
Copy link
Contributor Author

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED exists in the spec: https://github.com/signalfx/gdi-specification/blob/main/specification/configuration.md#instrumentation-libraries

Copy-paste error. I meant SPLUNK_REDIS_INCLUDE_COMMAND_ARGS 😅

@pellared
Copy link
Contributor Author

OTEL_TRACES_EXPORTER=none only means that nothing gets exported (but metrics are still collected, libraries are instrumented), however disabling tracing (for debugging or perf comparisons) means that the code is not monkeypatched.

In Go distro we are not setting up the Trace SDK when OTEL_TRACES_EXPORTER=none. This was good enough since a few years.

AFAIK, one of the reasons behind v3 of splunk-otel-js is to make a cleanup in distro configuration.
CC @akubik-splunk

@pellared
Copy link
Contributor Author

There is an agreement to keep the additional env vars: signalfx/gdi-specification#326 (comment).

Closing 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants