Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(helm): Split ingester HPA when zoneAwareReplication is enabled #14565

Merged

Conversation

Kellen275
Copy link
Contributor

@Kellen275 Kellen275 commented Oct 21, 2024

What this PR does / why we need it:
When running a distributed Loki with zone aware replication. The ingester HPA is not configured to properly locate the 3 zoned ingester StatefulSets. This MR adds zoned HPAs similar to how the existing ingester headless service and statefulsets are handled.

Which issue(s) this PR fixes:
Fixes #14021

Special notes for your reviewer:
This MR simply copies the existing hpa.yaml into 3 zoned yamls, and updates 3 lines

  1. Line 2, to verify zoneAwareReplication is enabled
  2. metadata.name to append the -zone-{a|b|c} suffix
  3. spec.scaleTargetRef.name to append the -zone-{a|b|c} suffix

We also update the hpa.yaml to not deploy when zoneAwareReplication is enabled (following the same pattern seen in service-ingester-headless.yaml

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@Kellen275 Kellen275 requested a review from a team as a code owner October 21, 2024 22:12
@CLAassistant
Copy link

CLAassistant commented Oct 21, 2024

CLA assistant check
All committers have signed the CLA.

This comment has been minimized.

Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 38a6d88 to 6565092 Compare October 23, 2024 15:58

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 6565092 to ada5f94 Compare October 24, 2024 22:48

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from ada5f94 to e5b57af Compare October 25, 2024 14:02

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from e5b57af to 533fb08 Compare October 25, 2024 20:56

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 533fb08 to af7b83a Compare October 29, 2024 20:05

This comment has been minimized.

@trevorwhitney
Copy link
Collaborator

Looks like helm-ci check is failing? This is good to go as soon as that is passing.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from af7b83a to 5084657 Compare October 31, 2024 22:45

This comment has been minimized.

@Kellen275
Copy link
Contributor Author

@trevorwhitney From what I can tell, the CI fails once the branch is out of date. But when I rebase I'm not seeing any failures. I sense I'm missing something simple here, sorry 😅

@Kellen275
Copy link
Contributor Author

Kellen275 commented Nov 2, 2024

Error: INSTALLATION FAILED: template: kube-prometheus-stack/templates/prometheus/prometheus.yaml:262:11: 
executing "kube-prometheus-stack/templates/prometheus/prometheus.yaml" at <ne .Values.prometheus.
prometheusSpec.scrapeConfigNamespaceSelector nil>: error calling ne: uncomparable type map[string]interface
 {}: map[]

Line in question: https://github.com/prometheus-community/helm-charts/blob/kube-prometheus-stack-65.5.1/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L262

Looking at other open PRs, it appears that ~all helm chart changes are failing with the same error

@trevorwhitney
Copy link
Collaborator

@Kellen275 yeah, it's breaking everything helm related at the moment, including releases

@trevorwhitney
Copy link
Collaborator

I've created an issue, this is affecting multiple PRs and all helm releases #14799

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 5084657 to 518c0ef Compare November 13, 2024 13:11

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 518c0ef to 13c867a Compare November 14, 2024 19:21

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

@trevorwhitney
Copy link
Collaborator

The helm linting failure seems to be related to the submodule in the operator folder, and submitting a PR from a fork. I've asked for some help looking into it.

@trevorwhitney
Copy link
Collaborator

the helm linting here has been fixed if you rebase with main

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 148e612 to a5a09df Compare November 25, 2024 11:33

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from a5a09df to 04329fe Compare November 25, 2024 14:13

This comment has been minimized.

@Kellen275 Kellen275 force-pushed the kgsappington/distributed-ingester-hpa branch from 04329fe to a3ffc78 Compare November 27, 2024 13:13
Copy link
Contributor

Kubernetes Manifest Diff Summary

Scenario: default-single-binary-values (Added: 0, Modified: 0, Removed: 0)

Summary:

  • Added: 0

  • Modified: 0

  • Removed: 0

Added Files

No added files

Modified Files

No modified files

Removed Files

No removed files

Scenario: default-values (Added: 0, Modified: 0, Removed: 0)

Summary:

  • Added: 0

  • Modified: 0

  • Removed: 0

Added Files

No added files

Modified Files

No modified files

Removed Files

No removed files

Scenario: ingress-values (Added: 0, Modified: 0, Removed: 0)

Summary:

  • Added: 0

  • Modified: 0

  • Removed: 0

Added Files

No added files

Modified Files

No modified files

Removed Files

No removed files

Scenario: legacy-monitoring-values (Added: 0, Modified: 0, Removed: 0)

Summary:

  • Added: 0

  • Modified: 0

  • Removed: 0

Added Files

No added files

Modified Files

No modified files

Removed Files

No removed files

Scenario: simple-scalable-aws-kube-irsa-values (Added: 0, Modified: 0, Removed: 0)

Summary:

  • Added: 0

  • Modified: 0

  • Removed: 0

Added Files

No added files

Modified Files

No modified files

Removed Files

No removed files

@trevorwhitney trevorwhitney merged commit 80e46f7 into grafana:main Nov 27, 2024
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HPA for Loki-indexer in a failed state when zoneAwareReplication is set to enabled
4 participants