From 59a2afaed9f089f7f20412147e144f1160c62899 Mon Sep 17 00:00:00 2001 From: Michel Hollands <42814411+MichelHollands@users.noreply.github.com> Date: Fri, 3 Nov 2023 10:22:16 +0000 Subject: [PATCH] Update changed metrics in docs (#11109) **What this PR does / why we need it**: After the default metrics namespace is changed to "loki" these metrics mentioned in the docs have to change as well. Should be merged at the same time as https://github.com/grafana/loki/pull/11110 **Checklist** - [X] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](https://github.com/grafana/loki/pull/10840/commits/0d4416a4b03739583349934b96f272fb4f685d15) Signed-off-by: Michel Hollands --- docs/sources/operations/autoscaling_queriers.md | 10 +++++----- docs/sources/operations/observability.md | 2 +- docs/sources/operations/scalability.md | 2 +- docs/sources/operations/shuffle-sharding/_index.md | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/sources/operations/autoscaling_queriers.md b/docs/sources/operations/autoscaling_queriers.md index 06ae330f971b4..52bcc60e00a0f 100644 --- a/docs/sources/operations/autoscaling_queriers.md +++ b/docs/sources/operations/autoscaling_queriers.md @@ -27,14 +27,14 @@ Because queriers pull queries from the query-scheduler queue and process them on - The scheduler queue size. - The queries running in the queriers. -The query-scheduler exposes the `cortex_query_scheduler_inflight_requests` metric. +The query-scheduler exposes the `loki_query_scheduler_inflight_requests` metric. It tracks the sum of queued queries plus the number of queries currently running in the querier workers. The following query is useful to scale queriers based on the inflight requests. ```promql sum( max_over_time( - cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile=""}[] + loki_query_scheduler_inflight_requests{namespace="loki-cluster", quantile=""}[] ) ) ``` @@ -66,7 +66,7 @@ So if we use 6 workers per querier, we will use the following query: ```promql clamp_min(ceil( avg( - avg_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[7d]) + avg_over_time(loki_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[7d]) ) / scalar(floor(vector(6 * 0.75))) ), 1) ``` @@ -79,7 +79,7 @@ The resulting query becomes: ```promql ceil( max( - max_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.5"}[7d]) + max_over_time(loki_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.5"}[7d]) ) / 6 ) ``` @@ -111,7 +111,7 @@ spec: triggers: - metadata: metricName: querier_autoscaling_metric - query: sum(max_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[2m])) + query: sum(max_over_time(loki_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[2m])) serverAddress: http://prometheus.default:9090/prometheus threshold: "4" type: prometheus diff --git a/docs/sources/operations/observability.md b/docs/sources/operations/observability.md index 58336dd4f7647..8f617bcf869dc 100644 --- a/docs/sources/operations/observability.md +++ b/docs/sources/operations/observability.md @@ -33,7 +33,7 @@ The Loki Ingesters expose the following metrics: | Metric Name | Metric Type | Description | | -------------------------------------------- | ----------- | --------------------------------------------------------------------------------------------------------- | -| `cortex_ingester_flush_queue_length` | Gauge | The total number of series pending in the flush queue. | +| `loki_ingester_flush_queue_length` | Gauge | The total number of series pending in the flush queue. | | `loki_chunk_store_index_entries_per_chunk` | Histogram | Number of index entries written to storage per chunk. | | `loki_ingester_memory_chunks` | Gauge | The total number of chunks in memory. | | `loki_ingester_memory_streams` | Gauge | The total number of streams in memory. | diff --git a/docs/sources/operations/scalability.md b/docs/sources/operations/scalability.md index 2de1278238546..ff8f1d06a0385 100644 --- a/docs/sources/operations/scalability.md +++ b/docs/sources/operations/scalability.md @@ -36,7 +36,7 @@ which informed the implementation._ By default, the `ruler` component embeds a query engine to evaluate rules. This generally works fine, except when rules are complex or have to process a large amount of data regularly. Poor performance of the `ruler` manifests as recording rules metrics -with gaps or missed alerts. This situation can be detected by alerting on the `cortex_prometheus_rule_group_iterations_missed_total` metric +with gaps or missed alerts. This situation can be detected by alerting on the `loki_prometheus_rule_group_iterations_missed_total` metric when it has a non-zero value. A solution to this problem is to externalize rule evaluation from the `ruler` process. The `ruler` embedded query engine diff --git a/docs/sources/operations/shuffle-sharding/_index.md b/docs/sources/operations/shuffle-sharding/_index.md index 4c80e48485ba5..3002b774ee542 100644 --- a/docs/sources/operations/shuffle-sharding/_index.md +++ b/docs/sources/operations/shuffle-sharding/_index.md @@ -86,9 +86,9 @@ The maximum number of queriers can be overridden on a per-tenant basis in the li These metrics reveal information relevant to shuffle sharding: -- the overall query-scheduler queue duration, `cortex_query_scheduler_queue_duration_seconds_*` +- the overall query-scheduler queue duration, `loki_query_scheduler_queue_duration_seconds_*` -- the query-scheduler queue length per tenant, `cortex_query_scheduler_queue_length` +- the query-scheduler queue length per tenant, `loki_query_scheduler_queue_length` - the query-scheduler queue duration per tenant can be found with this query: ```