Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaled down resources to use less. Fixed prometheus-adapter issues. Added documentation on why #7

Merged
merged 3 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,34 @@ CRDs are meant to be the powerhouse of Kubernetes. To make something Cloud/Kuber
Helm ignores this feature, and instead focuses on trying to template out all components. It leave this to working with the Kubernetes primitive, Pod/Service/Secrets services. Which are the basics, but aren't the full capabilities of the framework. They are really just the surface, and Helm encourage people away from those advanced and powerful capabilities with its workflows.

## Prometheus-Adapter has a bug in it, out the gate:
https://github.com/kubernetes-sigs/prometheus-adapter/issues/385
* https://github.com/kubernetes-sigs/prometheus-adapter/issues/385
* https://github.com/kubernetes-sigs/prometheus-adapter/issues/398

Actually it was this comment all the way down that gave me hope into what the issue could be: https://github.com/kubernetes-sigs/prometheus-adapter/issues/398#issuecomment-1443580236

Basically, depending on how you installed prometheus. You may not be providing a `node` value to the `node_cpu_seconds_total` - a critical metric with the default configuration of prometheus-adapter.

There are multiple ways you can fix this issue, depending whether you would like to relabel the metric in prometheus, or search for wherever the correct one is from the prometheus-adapter. I chose to fix it within prometheus, as its more helpful to have anyway, and with the prometheus UI, you can debug and prove whether you fixed it or not

Basically, within prometheus you need to add the following relabeling rule:
```yaml
prometheus-node-exporter:
monitor:
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: ^(.*)$
targetLabel: node
replacement: $1
action: replace
```

Once you've applied the change, test that the label exists within the UI and putting into the search `node_cpu_seconds_total`. You should see your new label in there

Now, with the default setup of prometheus-adapter, you should successfully be able to get your top usage nodes from kubectl:
```bash
kubectl top nodes
```

## S3 external storage documentation and secure configuration of keys is basically all out of date, scattered around, or broken!
The grafana docs are complete shit. I've read it from multiple forums already, but this is my first experience where its truly shown its colors. In order to get proper cloud storage setup, i've had to jump between a bunch of forums, blind guess through a whole bunch of possibilities, and then stumble on a makeshift of a couple options in order to get everything working
Expand Down Expand Up @@ -307,6 +334,8 @@ To get more verbose output, also pass these arguments in the `extraArgs` section
```
Again, `--log.level=debug` and `--print-config-stderr` are pretty useless until you get your `aws.s3` configuration correct. You'll be stuck with generic errors until you get that sorted

**Note:** There is no typo on `-config.expand-env=true`, it only prefixes with 1 dash. Don't ask me why


## Bonus Garbage
Oh, also. A whole bunch of these docs talk about using boltdb_shipper. That thing is deprecated! (https://grafana.com/docs/loki/latest/configure/storage/#boltdb-deprecated) There is a new one (https://grafana.com/docs/loki/latest/configure/storage/#tsdb-recommended), but man...documentation ? Where is it ? Nobody appears to be using this yet either
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,6 @@ rules:
resource: namespace
pod:
resource: pod
instance:
resource: node
containerLabel: container
memory:
containerQuery: |
Expand All @@ -187,8 +185,6 @@ rules:
)
resources:
overrides:
instance:
resource: node
node:
resource: node
namespace:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2141,13 +2141,13 @@ prometheus-node-exporter:
## RelabelConfigs to apply to samples before scraping
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#relabelconfig
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
# separator: ;
# regex: ^(.*)$
# targetLabel: nodename
# replacement: $1
# action: replace
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: ^(.*)$
targetLabel: node
replacement: $1
action: replace
rbac:
## If true, create PSPs for node-exporter
##
Expand Down
6 changes: 3 additions & 3 deletions modules/k8config/modules/promtail/res/promtail-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ daemonset:
deployment:
# -- Deploys Promtail as a Deployment
enabled: true
replicaCount: 3
replicaCount: 1
autoscaling:
# -- Creates a HorizontalPodAutoscaler for the deployment
enabled: true
minReplicas: 3
maxReplicas: 10
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage:
# behavior: {}
Expand Down
Loading