Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health_check extension to bundled otel.yml file #5369

Merged
merged 5 commits into from
Sep 3, 2024

Conversation

ycombinator
Copy link
Contributor

What does this PR do?

This PR adds the health_check extension to the otel.yml file that's bundled with the Elastic Agent package.

Why is it important?

Without this addition, if a user tries to install the open-telemetry/opentelemetry-collector Helm chart with Elastic's distribution of the OTel Collector (i.e. Elastic Agent), the installation eventually fails. Specifically, as noted by @andrzej-stencel in #5092 (comment):

One problem I'm seeing when running the chart with this configuration is that the Health Check extension is not used in the otel.yml file, which results in Kubernetes restarting the collector container and eventually resulting in a CrashLoopBackOff. We should add the health_check extension to service::extensions in the otel.yml file that's bundled with the image.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

If users are already running something that's listening on localhost:13133, there will be a port conflict.

How to test this PR locally

See #5092 (comment) and #5092 (comment)

@ycombinator ycombinator added bug Something isn't working backport-8.15 Automated backport to the 8.15 branch with mergify opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector labels Aug 28, 2024
@ycombinator ycombinator requested a review from a team as a code owner August 28, 2024 00:28
@@ -21,6 +21,9 @@ extensions:
check_interval: 1s
limit_mib: 700
spike_limit_mib: 180
health_check:
endpoint: "localhost:13133"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this value based on the documentation in https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/healthcheckextension/README.md#health-check. I'm happy to change it to another value if others have a better suggestion.

otel.yml Show resolved Hide resolved
@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Aug 28, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

otel.yml Outdated Show resolved Hide resolved
otel.yml Show resolved Hide resolved
@ycombinator ycombinator requested a review from rogercoll August 28, 2024 06:28
@rogercoll
Copy link
Contributor

Sounds good to me on adding the health_check extension in the bundled otel.yml. 👍

Without this addition, if a user tries to install the open-telemetry/opentelemetry-collector Helm chart with Elastic's distribution of the OTel Collector (i.e. Elastic Agent), the installation eventually fails.

I think that this will only happen if the user tries to use the otel.yml file as it overrides the configuration set by the Chart:

$ kubectl describe pod  elastic-otelcol-opentelemetry-collector-agent-d9s4k
...
Command:
      //usr/share/elastic-agent/elastic-agent
    Args:
      --config=/conf/relay.yaml
      otel
      --config=otel.yml

Note that the Chart added another default config --config=/conf/relay.yaml (which contains the health_check extension). A workaround is to not use the otel.yml config and rely on the one configured by the Chart:

helm install elastic-otelcol open-telemetry/opentelemetry-collector --version=0.102.1 --set mode=daemonset --set image.repository="docker.elastic.co/beats/elastic-agent" --set image.tag="8.16.0-SNAPSHOT" --set image.pullPolicy=Always --set command.name="/usr/share/elastic-agent/elastic-agent" --set command.extraArgs="{otel}"

The recommened approach to provide a custom configuration for the collector is to manully create a configuration configMap. There is a PR upstream that will allow overriding the default configuration with a config option: open-telemetry/opentelemetry-helm-charts#1301

Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@ycombinator ycombinator enabled auto-merge (squash) August 30, 2024 21:10
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the updating the path.

@ycombinator ycombinator merged commit 445fa44 into elastic:main Sep 3, 2024
9 checks passed
mergify bot pushed a commit that referenced this pull request Sep 3, 2024
* Add health_check extension to bundled otel.yml file

* Adding CHANGELOG

* Add extension to pipeline

Co-authored-by: Roger Coll <[email protected]>

* Change path to health check endpoint

* Revert change to health check endpoint path

---------

Co-authored-by: Roger Coll <[email protected]>
(cherry picked from commit 445fa44)
ycombinator added a commit that referenced this pull request Sep 3, 2024
* Add health_check extension to bundled otel.yml file

* Adding CHANGELOG

* Add extension to pipeline

Co-authored-by: Roger Coll <[email protected]>

* Change path to health check endpoint

* Revert change to health check endpoint path

---------

Co-authored-by: Roger Coll <[email protected]>
(cherry picked from commit 445fa44)

Co-authored-by: Shaunak Kashyap <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify bug Something isn't working opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants