[OTel] Additional components for the Elastic Distro #5092

ChrsMark · 2024-07-09T13:46:36Z

Describe the enhancement:

K8s is one of the most used envs according to https://opentelemetry.io/blog/2024/otel-collector-survey/#deployment-scale-and-environment, hence it makes sense to consult what the community distro for k8s includes.
We can evaluate the list of the k8s-distro components and consider including them as well in the Elastic distro: https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-k8s/manifest.yaml

/cc @elastic/otel-devs

Describe a specific use case for the enhancement or feature:

What is the definition of done?

Taken from #5092 (comment):

The following additional OTel Collector components are included in the Elastic Distribution of the OTel Collector, i.e. Elastic Agent:
- prometheus receiver
- memory_limiter processor
- health_check extension
The heml install ... command mentioned in [OTel] Additional components for the Elastic Distro #5092 (comment) succeds.

The text was updated successfully, but these errors were encountered:

ycombinator · 2024-07-09T20:31:20Z

@strawgate @AlexanderWert What is the priority for this issue? Which release do we want to target it to?

AlexanderWert · 2024-07-10T06:13:58Z

We can evaluate the list of the k8s-distro components and consider including them as well in the Elastic distro

Looking at the list of components it seems like there are still many components that sound less relevant for our distro (like spanmetricsconnector, servicegraphconnector, etc.). As you wrote, I think we need to evaluate and come up with our own list inspired by that list.

@strawgate @AlexanderWert What is the priority for this issue? Which release do we want to target it to?

I think 8.15.2 would be reasonable for that, since that's when we plan to have a broader k8s integration for OTel.

ycombinator · 2024-07-10T21:28:56Z

Looking at the list of components it seems like there are still many components that sound less relevant for our distro (like spanmetricsconnector, servicegraphconnector, etc.). As you wrote, I think we need to evaluate and come up with our own list inspired by that list.

What would the criteria be for deciding whether to include a component from the list in our distro vs. not include it?

ChrsMark · 2024-07-12T12:53:56Z

What would the criteria be for deciding whether to include a component from the list in our distro vs. not include it?

I would consider as required those that are defined by default in the upstream's Helm Charts: https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml

With this users that already use the upstream charts could switch to the Elastic distro by just defining the image properly:

helm install my-opentelemetry-collector open-telemetry/opentelemetry-collector --set mode=daemonset --set image.repository="docker.elastic.co/beats/elastic-agent" --set image.tag="8.15.0-SNAPSHOT" --set command.name="/usr/share/elastic-agent/elastic-agent" --set command.extraArgs="{otel, -c, --config=etc/elastic-agent/otel.yaml}"

The above gives me the following in the failed Collector's Pod logs:

Starting in otel mode
1 error occurred:
	* failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'receivers': unknown type: "prometheus" for id: "prometheus" (valid values: [otlp filelog kubeletstats k8s_cluster hostmetrics httpcheck])
error decoding 'processors': unknown type: "memory_limiter" for id: "memory_limiter" (valid values: [transform filter k8sattributes elasticinframetrics resourcedetection batch resource attributes])
error decoding 'extensions': unknown type: "health_check" for id: "health_check" (valid values: [file_storage memory_limiter])

A working PoC: https://github.com/elastic/opentelemetry-dev/pull/331

rogercoll · 2024-07-23T15:32:18Z

As defined in the collector best practices docs, the memory_limiter processor should also be included to efficiently manage the resource pipelines: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-around-resource-utilization

ycombinator · 2024-07-23T15:51:14Z

As defined in the collector best practices docs, the memory_limiter processor should also be included to efficiently manage the resource pipelines: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-around-resource-utilization

In #4689, we explicitly chose the memory_limiter extension over the processor because the extension doesn't drop events but instead puts backpressure on the receivers. Do you think we should revert that decision and go with the processor again? cc: @michalpristas

[UPDATE] Chatting with @cmacknz off-issue, there's no reason not to include both the extension and the processor, if we need the processor for our collector distro to work on k8s, so will keep the memory_limiter processor in the list of additional components.

rogercoll · 2024-07-24T07:25:02Z

@ycombinator I was checking the extension and from the docs it seems that in the future it should be used in favor of the processor:

The memory limiter extension is used to prevent out of memory situations on the collector. The extension will potentially replace the Memory Limiter Processor.

But the extension is currently nop:

The extension is under development and does nothing.

I suggested adding the processor because the upstream Helm Chart requires it. Opening an issue to remove the dependency in Helm open-telemetry/opentelemetry-helm-charts#1272

michalpristas · 2024-07-24T07:31:34Z

let's also monitor size of binary, difference between core collector and full contrib collector is a bit over 300MB
@cmacknz do we find this concerning?

cmacknz · 2024-07-24T16:24:35Z

Including the collector has to make agent larger, and we need more functionality than what the core collector has, so this is really unavoidable. Be mindful of what components we include to make sure they have a purpose (e.g. we probably don't need everything in contrib) but there's not much else we can do in the short term.

rogercoll · 2024-07-26T14:53:15Z

From the Helm deployment perspective, only the health_check extension is required in the values configuration. Other components (jaeger, zipkin, etc.) can be removed from the sample configuration by setting their value to null: https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#basic-top-level-configuration

Meanwhile, the health_check requirement can also be skipped in the Helm deployment by using a custom configmap (demo example)

ChrsMark · 2024-08-01T08:43:12Z

FYI a working example can be found at https://github.com/elastic/opentelemetry-dev/pull/331, which uses #5126.

ycombinator · 2024-08-22T21:54:43Z

I'm trying to verify that the changes made in #5242 are sufficient to resolve this issue here.

If I try the helm install ... command mentioned by @ChrsMark in #5092 (comment) with the elastic-agent-8.16.0-SNAPSHOT image, I'm getting an error in the pod logs:

$ helm install my-opentelemetry-collector open-telemetry/opentelemetry-collector \
  --set mode=daemonset \
  --set image.repository="docker.elastic.co/beats/elastic-agent" \
  --set image.tag="8.16.0-SNAPSHOT" \
  --set command.name="/usr/share/elastic-agent/elastic-agent" \
  --set command.extraArgs="{otel, -c, --config=etc/elastic-agent/otel.yaml}"

$ kubectl logs -p my-opentelemetry-collector-agent-5nf44

Starting in otel mode
failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

error decoding 'exporters': unknown type: "logging" for id: "logging" (valid values: [otlp debug file elasticsearch otlphttp])

@ChrsMark could you help me understand where the logging exporter in the error message is coming from? Also, are you sure the config path in the --set command.extraArgs should be etc/elastic-agent/otel.yaml and not /etc/elastic-agent/otel.yml?

andrzej-stencel · 2024-08-26T07:18:00Z

@ycombinator I believe it's a combination of a couple issues. To fix this, do the following two things:

Make sure you have an up-to-date OpenTelemetry Collector chart on your machine. You can check the current version with helm search repo and update with helm repo update open-telemetry.

Here's the output I had on my machine:

$ helm search repo
NAME                                   	CHART VERSION	APP VERSION	DESCRIPTION                                       
open-telemetry/opentelemetry-collector 	0.92.0       	0.101.0    	OpenTelemetry Collector Helm chart for Kubernetes 
open-telemetry/opentelemetry-demo      	0.30.6       	1.9.0      	opentelemetry demo helm chart                     
open-telemetry/opentelemetry-ebpf      	0.1.0        	v0.10.0    	OpenTelemetry eBPF Helm chart for Kubernetes      
open-telemetry/opentelemetry-kube-stack	0.0.5        	0.98.0     	OpenTelemetry Quickstart chart for Kubernetes. ...
open-telemetry/opentelemetry-operator  	0.60.0       	0.100.1    	OpenTelemetry Operator Helm chart for Kubernetes  
$ helm repo update open-telemetry                        
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "open-telemetry" chart repository
Update Complete. ⎈Happy Helming!⎈
$ helm search repo                                                                      
NAME                                   	CHART VERSION	APP VERSION	DESCRIPTION                                       
open-telemetry/opentelemetry-collector 	0.102.1      	0.107.0    	OpenTelemetry Collector Helm chart for Kubernetes 
open-telemetry/opentelemetry-demo      	0.32.6       	1.11.1     	opentelemetry demo helm chart                     
open-telemetry/opentelemetry-ebpf      	0.1.1        	v0.10.2    	OpenTelemetry eBPF Helm chart for Kubernetes      
open-telemetry/opentelemetry-kube-stack	0.2.0        	0.107.0    	OpenTelemetry Quickstart chart for Kubernetes. ...
open-telemetry/opentelemetry-operator  	0.68.1       	0.107.0    	OpenTelemetry Operator Helm chart for Kubernetes

Use the corrected helm install command. Here's the command that worked for me to install the OpenTelemetry Collector Helm chart with the latest version of the Elastic Agent 8.16.0 snapshot image:

helm install elastic-otelcol open-telemetry/opentelemetry-collector --version=0.102.1 \
  --set mode=daemonset \
  --set image.repository="docker.elastic.co/beats/elastic-agent" \
  --set image.tag="8.16.0-SNAPSHOT" \
  --set image.pullPolicy=Always \
  --set command.name="/usr/share/elastic-agent/elastic-agent" \
  --set command.extraArgs="{otel, --config=otel.yml}"

I have introduced the following changes compared to the command from your comment:

a. Added the --version option to helm install command to make sure I'm using the version of the chart I think I'm using. This way this command will fail for a user that does not have the latest version of the chart downloaded.

b. Added --image.pullPolicy=Always to make sure I'm using the latest version of the docker.elastic.co/beats/elastic-agent:8.16.0-SNAPSHOT image. If I'm not mistaken, the tag 8.16.0-SNAPSHOT is not stable, i.e. it points to a different digest of the image over time. An alternative to setting the pullPolicy to Always is to specify the image digest instead of the tag with --set image.digest.

c. Corrected the path to the otel.yml config file bundled with the image.

andrzej-stencel · 2024-08-26T08:12:16Z

One problem I'm seeing when running the chart with this configuration is that the Health Check extension is not used in the otel.yml file, which results in Kubernetes restarting the collector container and eventually resulting in a CrashLoopBackOff. We should add the health_check extension to service::extensions in the otel.yml file that's bundled with the image.

ycombinator · 2024-08-28T00:18:54Z

Thanks for your guidance, @andrzej-stencel! Indeed, I had a (way) outdated version of the Helm chart!

$ helm search repo open-telemetry
open-telemetry/opentelemetry-collector	0.78.0       	0.92.0     	OpenTelemetry Collector Helm chart for Kubernetes
open-telemetry/opentelemetry-demo     	0.27.0       	1.7.0      	opentelemetry demo helm chart
open-telemetry/opentelemetry-ebpf     	0.1.0        	v0.10.0    	OpenTelemetry eBPF Helm chart for Kubernetes
open-telemetry/opentelemetry-operator 	0.45.2       	0.91.0     	OpenTelemetry Operator Helm chart for Kubernetes

And thanks for the fixes to the helm install command as well. ❤️

ycombinator · 2024-08-28T00:31:33Z

One problem I'm seeing when running the chart with this configuration is that the Health Check extension is not used in the otel.yml file, which results in Kubernetes restarting the collector container and eventually resulting in a CrashLoopBackOff. We should add the health_check extension to service::extensions in the otel.yml file that's bundled with the image.

Thanks. Created #5369 to try and address this problem.

ycombinator added the enhancement New feature or request label Jul 9, 2024

ChrsMark mentioned this issue Jul 12, 2024

[WiP] Add required otel components for upstream Helm chart's compatability #5126

Closed

7 tasks

MichaelKatsoulis mentioned this issue Jul 19, 2024

Add k8sobjectsreceiver into otel receivers #5172

Merged

7 tasks

mergify bot mentioned this issue Jul 22, 2024

[8.15](backport #5172) Add k8sobjectsreceiver into otel receivers #5184

Merged

7 tasks

ycombinator assigned michalpristas Jul 23, 2024

michalpristas mentioned this issue Aug 1, 2024

[otel] Added components to comply with otel demo #5242

Merged

michalpristas closed this as completed in #5242 Aug 2, 2024

mergify bot mentioned this issue Aug 2, 2024

[8.15](backport #5242) [otel] Added components to comply with otel demo #5247

Merged

ycombinator mentioned this issue Aug 28, 2024

Add health_check extension to bundled otel.yml file #5369

Merged

7 tasks

mergify bot mentioned this issue Sep 3, 2024

[8.15](backport #5369) Add health_check extension to bundled otel.yml file #5412

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OTel] Additional components for the Elastic Distro #5092

[OTel] Additional components for the Elastic Distro #5092

ChrsMark commented Jul 9, 2024 •

edited by ycombinator

Loading

ycombinator commented Jul 9, 2024

AlexanderWert commented Jul 10, 2024

ycombinator commented Jul 10, 2024

ChrsMark commented Jul 12, 2024 •

edited

Loading

rogercoll commented Jul 23, 2024 •

edited

Loading

ycombinator commented Jul 23, 2024 •

edited

Loading

rogercoll commented Jul 24, 2024 •

edited

Loading

michalpristas commented Jul 24, 2024

cmacknz commented Jul 24, 2024

rogercoll commented Jul 26, 2024

ChrsMark commented Aug 1, 2024

ycombinator commented Aug 22, 2024

andrzej-stencel commented Aug 26, 2024 •

edited

Loading

andrzej-stencel commented Aug 26, 2024

ycombinator commented Aug 28, 2024

ycombinator commented Aug 28, 2024

[OTel] Additional components for the Elastic Distro #5092

[OTel] Additional components for the Elastic Distro #5092

Comments

ChrsMark commented Jul 9, 2024 • edited by ycombinator Loading

ycombinator commented Jul 9, 2024

AlexanderWert commented Jul 10, 2024

ycombinator commented Jul 10, 2024

ChrsMark commented Jul 12, 2024 • edited Loading

rogercoll commented Jul 23, 2024 • edited Loading

ycombinator commented Jul 23, 2024 • edited Loading

rogercoll commented Jul 24, 2024 • edited Loading

michalpristas commented Jul 24, 2024

cmacknz commented Jul 24, 2024

rogercoll commented Jul 26, 2024

ChrsMark commented Aug 1, 2024

ycombinator commented Aug 22, 2024

andrzej-stencel commented Aug 26, 2024 • edited Loading

andrzej-stencel commented Aug 26, 2024

ycombinator commented Aug 28, 2024

ycombinator commented Aug 28, 2024

ChrsMark commented Jul 9, 2024 •

edited by ycombinator

Loading

ChrsMark commented Jul 12, 2024 •

edited

Loading

rogercoll commented Jul 23, 2024 •

edited

Loading

ycombinator commented Jul 23, 2024 •

edited

Loading

rogercoll commented Jul 24, 2024 •

edited

Loading

andrzej-stencel commented Aug 26, 2024 •

edited

Loading