Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTL-3190 Additional fixes for operator install order issues using Helm hooks #1559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jvoravong
Copy link
Contributor

@jvoravong jvoravong commented Dec 5, 2024

Description
Helm install failures occur in rare cases due to deployment order issues:

  • Instrumentation Object: May fail because the Operator and its webhook are not ready.
  • Cert-Manager Certificate: May fail because Cert-Manager is not yet initialized.

Proposed Solution
Add Helm hooks to enforce readiness checks and ensure resources are deployed in the correct order. Cert-Manager and Operator readiness will be validated before Instrumentation deployment. We will use appropriate Hook weight values to ensure each related operation is completed sequentially.

Installation Order

  1. Default Installation Phase: Deploy Cert-Manager, Operator, and Operator CRDs.
  2. Cert-Manager Readiness Check: Post-install hook (weight 1) verifies Cert-Manager is operational.
  3. Install Issuer: Post-install hook (weight 2) deploys the Cert-Manager Issuer.
  4. Install Certificate: Post-install hook (weight 3) deploys the Cert-Manager Certificate.
  5. Operator Readiness Checks: Post-install hooks (weight 4) validate the Operator and related webhook are ready.
  6. Install Instrumentation: Post-install hook (weight 5) deploys Instrumentation resources.

Benefits

  • Ensures dependencies are deployed in the correct order.
  • Mitigates Helm installation errors reported from the field.

Cons

  • Using Helm hooks like this and this much is an anti-pattern for Helm.

Callouts

  • Future Refactoring: We may refactor Cert-Manager usage and CRD installation in the near future, potentially eliminating the need for post-install hooks. While this PR provides an effective interim solution, it may be short-lived within this project.

Testing

  • Verified with local and lab Kubernetes clusters.

Related Alernative solution: #1561

@jvoravong jvoravong force-pushed the otl-3190 branch 2 times, most recently from cdf30e0 to d3ca22a Compare December 5, 2024 22:40
…ntation opentelemetry.io/v1alpha1 are installed too early
@jvoravong jvoravong marked this pull request as ready for review December 5, 2024 23:13
@jvoravong jvoravong requested review from a team as code owners December 5, 2024 23:13
@jvoravong jvoravong changed the title Additional fixes for operator install order issues OTL-3190 Additional fixes for operator install order issues Dec 6, 2024
@jvoravong jvoravong changed the title OTL-3190 Additional fixes for operator install order issues OTL-3190 Additional fixes for operator install order issues using Helm hooks Dec 6, 2024
@jvoravong
Copy link
Contributor Author

This ticket can be closed in favor of #1586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant