Skip to content

Commit

Permalink
*: match naming convention in metrics and documentation (thanos-io#3309)
Browse files Browse the repository at this point in the history
* pkg/compact: Fix compactor name in comments

Rename compact to compactor

Signed-off-by: Mathis Raguin <[email protected]>

* pkg/compact: Rename metrics to thanos_compact_*

Some metrics used to be under thanos_compactor_*

Signed-off-by: Mathis Raguin <[email protected]>

* rule: match metrics name with convention

Signed-off-by: Mathis Raguin <[email protected]>

* rule: match documentation with naming convention

Signed-off-by: Mathis Raguin <[email protected]>

* query: match metrics name with convention

Signed-off-by: Mathis Raguin <[email protected]>

* query: match documentation with naming convention

Signed-off-by: Mathis Raguin <[email protected]>

* query: fix naming convention in examples

Signed-off-by: Mathis Raguin <[email protected]>

* CONTRIBUTING: update naming convention with new query frontend componenent

Signed-off-by: Mathis Raguin <[email protected]>

* update changelog for concerned metrics

Signed-off-by: Mathis Raguin <[email protected]>

* compact: match naming convention for compactor

tests were moved since the PR was created.

Signed-off-by: Mathis Raguin <[email protected]>
  • Loading branch information
Sayrus authored Nov 12, 2020
1 parent 90579ba commit 6889ae8
Show file tree
Hide file tree
Showing 22 changed files with 101 additions and 100 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#2906](https://github.com/thanos-io/thanos/pull/2906) Tools: Refactor Bucket replicate execution. Removed all `thanos_replicate_origin_.*` metrics.
- `thanos_replicate_origin_meta_loads_total` can be replaced by `blocks_meta_synced{state="loaded"}`.
- `thanos_replicate_origin_partial_meta_reads_total` can be replaced by `blocks_meta_synced{state="failed"}`.
- [#3309](https://github.com/thanos-io/thanos/pull/3309) Compact: *breaking :warning:* Rename metrics to match naming convention. This includes metrics starting with `thanos_compactor` to `thanos_compact`, `thanos_querier` to `thanos_query` and `thanos_ruler` to `thanos_rule`.

## [v0.16.0](https://github.com/thanos-io/thanos/releases) - 2020.10.26

Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ In the code and documentation prefer non-offensive terminology, for example:

Thanos is a distributed system composed with several services and CLI tools as listed [here](/cmd/thanos).

When we refer to them as technical reference we use verb form: `store`, `compact`, `rule`, `query`. This includes:
When we refer to them as technical reference we use verb form: `store`, `compact`, `rule`, `query`, `query_frontend`. This includes:

* Code
* Metrics
Expand All @@ -59,7 +59,7 @@ When we refer to them as technical reference we use verb form: `store`, `compact
* Package names
* Log messages, traces

However, when speaking about those or explaining we use `actor` noun form: `store gateway, compactor, ruler, querier`. This includes areas like:
However, when speaking about those or explaining we use `actor` noun form: `store gateway`, `compactor`, `ruler`, `querier`, `query frontend`. This includes areas like:

* Public communication
* Documentation
Expand Down
14 changes: 7 additions & 7 deletions cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,32 +101,32 @@ func runCompact(
) error {
deleteDelay := time.Duration(conf.deleteDelay)
halted := promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "thanos_compactor_halted",
Name: "thanos_compact_halted",
Help: "Set to 1 if the compactor halted due to an unexpected error.",
})
halted.Set(0)
retried := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_retries_total",
Name: "thanos_compact_retries_total",
Help: "Total number of retries after retriable compactor error.",
})
iterations := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_iterations_total",
Name: "thanos_compact_iterations_total",
Help: "Total number of iterations that were executed successfully.",
})
partialUploadDeleteAttempts := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_aborted_partial_uploads_deletion_attempts_total",
Name: "thanos_compact_aborted_partial_uploads_deletion_attempts_total",
Help: "Total number of started deletions of blocks that are assumed aborted and only partially uploaded.",
})
blocksCleaned := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_blocks_cleaned_total",
Name: "thanos_compact_blocks_cleaned_total",
Help: "Total number of blocks deleted in compactor.",
})
blockCleanupFailures := promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_compactor_block_cleanup_failures_total",
Name: "thanos_compact_block_cleanup_failures_total",
Help: "Failures encountered while deleting blocks in compactor.",
})
blocksMarked := promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
Name: "thanos_compactor_blocks_marked_total",
Name: "thanos_compact_blocks_marked_total",
Help: "Total number of blocks marked in compactor.",
}, []string{"marker"})
blocksMarked.WithLabelValues(metadata.NoCompactMarkFilename)
Expand Down
4 changes: 2 additions & 2 deletions cmd/thanos/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ func runQuery(
fileSDCache := cache.New()
dnsStoreProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix("thanos_querier_store_apis_", reg),
extprom.WrapRegistererWithPrefix("thanos_query_store_apis_", reg),
dns.ResolverType(dnsSDResolver),
)

Expand All @@ -279,7 +279,7 @@ func runQuery(

dnsRuleProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix("thanos_querier_rule_apis_", reg),
extprom.WrapRegistererWithPrefix("thanos_query_rule_apis_", reg),
dns.ResolverType(dnsSDResolver),
)

Expand Down
6 changes: 3 additions & 3 deletions cmd/thanos/rule.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ func registerRule(app *extkingpin.App) {
})
}

// RuleMetrics defines thanos rule metrics.
// RuleMetrics defines Thanos Ruler metrics.
type RuleMetrics struct {
configSuccess prometheus.Gauge
configSuccessTime prometheus.Gauge
Expand Down Expand Up @@ -341,7 +341,7 @@ func runRule(

queryProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix("thanos_ruler_query_apis_", reg),
extprom.WrapRegistererWithPrefix("thanos_rule_query_apis_", reg),
dns.ResolverType(dnsSDResolver),
)
var queryClients []*http_util.Client
Expand Down Expand Up @@ -404,7 +404,7 @@ func runRule(

amProvider := dns.NewProvider(
logger,
extprom.WrapRegistererWithPrefix("thanos_ruler_alertmanagers_", reg),
extprom.WrapRegistererWithPrefix("thanos_rule_alertmanagers_", reg),
dns.ResolverType(dnsSDResolver),
)
var alertmgrs []*alert.Alertmanager
Expand Down
4 changes: 2 additions & 2 deletions docs/components/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -623,9 +623,9 @@ Flags:

The `tools rules-check` subcommand contains tools for validation of Prometheus rules.

This is allowing to check the rules with the same validation as is used by the Thanos rule node.
This is allowing to check the rules with the same validation as is used by the Thanos Ruler node.

NOTE: The check is equivalent to the `promtool check rules` with addition of Thanos rule extended rules file syntax,
NOTE: The check is equivalent to the `promtool check rules` with addition of Thanos Ruler extended rules file syntax,
which includes `partial_response_strategy` field which `promtool` does not allow.

If the check fails the command fails with exit code `1`, otherwise `0`.
Expand Down
8 changes: 4 additions & 4 deletions docs/operating/reverse-proxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Let's look into some example scenarios. All examples are using nginx as a revers

### Serving Thanos on a subdomain

Serving a Thanos component on the root of a subdomain is pretty straight-forward. Let's say you want to run Thanos Query behind a nginx reverse proxy, accessible on domain `thanos.example.com`. A basic nginx configuration would look like this:
Serving a Thanos component on the root of a subdomain is pretty straight-forward. Let's say you want to run Thanos Querier behind a nginx reverse proxy, accessible on domain `thanos.example.com`. A basic nginx configuration would look like this:

```
http {
Expand All @@ -37,11 +37,11 @@ http {

### Serving Thanos on a sub-path

Things become a little tricky when you want to serve Thanos on a sub-path. Let's say, you want to run Thanos Query behind an nginx server, accessible on the URL `http://example.com/thanos`. The Thanos web UI depends on it being accessed on the same URL as Thanos itself is listening. This is because the UI needs to know the URL from where to load static assets and what URL to use in links or redirects. If Thanos is behind a reverse proxy, particularly one where Thanos is not at the root, this doesn't work so well.
Things become a little tricky when you want to serve Thanos on a sub-path. Let's say, you want to run Thanos Querier behind an nginx server, accessible on the URL `http://example.com/thanos`. The Thanos web UI depends on it being accessed on the same URL as Thanos itself is listening. This is because the UI needs to know the URL from where to load static assets and what URL to use in links or redirects. If Thanos is behind a reverse proxy, particularly one where Thanos is not at the root, this doesn't work so well.

To tackle this problem, Thanos provides a flag `--web.external-prefix`.

Let's say we have Thanos Query running on the usual port, we need nginx running with the following configuration:
Let's say we have Thanos Querier running on the usual port, we need nginx running with the following configuration:

```
http {
Expand All @@ -56,7 +56,7 @@ http {
}
```

With this configuration, you can access Thanos Query on `http://example.com/thanos`. Notice that because we are using `http://localhost:10902/thanos/` as the reverse proxy target, every request path will be prefixed with `/thanos`. To make this work we need to run Thanos Query like this:
With this configuration, you can access Thanos Querier on `http://example.com/thanos`. Notice that because we are using `http://localhost:10902/thanos/` as the reverse proxy target, every request path will be prefixed with `/thanos`. To make this work we need to run Thanos Querier like this:

```
thanos query --web.external-prefix="thanos"
Expand Down
12 changes: 6 additions & 6 deletions docs/quick-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,13 +118,13 @@ Now that we have setup the Sidecar for one or more Prometheus instances, we want
The Query component is stateless and horizontally scalable and can be deployed with any number of replicas. Once connected to the Sidecars, it automatically detects which Prometheus servers need to be contacted for a given PromQL query.
Query also implements Prometheus's official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus's UI for ad-hoc querying and stores status.
Thanos Querier also implements Prometheus's official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus's UI for ad-hoc querying and stores status.
Below, we will set up a Query to connect to our Sidecars, and expose its HTTP UI.
Below, we will set up a Thanos Querier to connect to our Sidecars, and expose its HTTP UI.
```bash
thanos query \
--http-address 0.0.0.0:19192 \ # HTTP Endpoint for Query UI
--http-address 0.0.0.0:19192 \ # HTTP Endpoint for Thanos Querier UI
--store 1.2.3.4:19090 \ # Static gRPC Store API Address for the query node to query
--store 1.2.3.5:19090 \ # Also repeatable
--store dnssrv+_grpc._tcp.thanos-store.monitoring.svc # Supports DNS A & SRV records
Expand All @@ -149,7 +149,7 @@ global:

In a Kubernetes stateful deployment, the replica label can also be the pod name.

Reload your Prometheus instances, and then, in Query, we will define `replica` as the label we want to enable deduplication to occur on:
Reload your Prometheus instances, and then, in Thanos Querier, we will define `replica` as the label we want to enable deduplication to occur on:

```bash
thanos query \
Expand All @@ -170,11 +170,11 @@ The only required communication between nodes is for Thanos Querier to be able t
The metadata includes the information about time windows and external labels for each node.

There are various ways to tell query component about the StoreAPIs it should query data from. The simplest way is to use a static list of well known addresses to query.
These are repeatable so can add as many endpoint as needed. You can put DNS domain prefixed by `dns+` or `dnssrv+` to have Thanos Query do an `A` or `SRV` lookup to get all required IPs to communicate with.
These are repeatable so can add as many endpoint as needed. You can put DNS domain prefixed by `dns+` or `dnssrv+` to have Thanos Querier do an `A` or `SRV` lookup to get all required IPs to communicate with.

```bash
thanos query \
--http-address 0.0.0.0:19192 \ # Endpoint for Query UI
--http-address 0.0.0.0:19192 \ # Endpoint for Thanos Querier UI
--grpc-address 0.0.0.0:19092 \ # gRPC endpoint for Store API
--store 1.2.3.4:19090 \ # Static gRPC Store API Address for the query node to query
--store 1.2.3.5:19090 \ # Also repeatable
Expand Down
26 changes: 13 additions & 13 deletions docs/service-discovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ Service Discovery (SD) is a vital part of several Thanos components. It allows T

SD is currently used in the following places within Thanos:

* `Thanos Query` needs to know about [StoreAPI](https://github.com/thanos-io/thanos/blob/d3fb337da94d11c78151504b1fccb1d7e036f394/pkg/store/storepb/rpc.proto#L14) servers in order to query metrics from them.
* `Thanos Rule` needs to know about `QueryAPI` servers in order to evaluate recording and alerting rules.
* `Thanos Rule` needs to know about `Alertmanagers` HA replicas in order to send alerts.
* `Thanos Querier` needs to know about [StoreAPI](https://github.com/thanos-io/thanos/blob/d3fb337da94d11c78151504b1fccb1d7e036f394/pkg/store/storepb/rpc.proto#L14) servers in order to query metrics from them.
* `Thanos Ruler` needs to know about `QueryAPI` servers in order to evaluate recording and alerting rules.
* `Thanos Ruler` needs to know about `Alertmanagers` HA replicas in order to send alerts.

There are currently several ways to configure SD, described below in more detail:

Expand All @@ -24,15 +24,15 @@ There are currently several ways to configure SD, described below in more detail

The simplest way to tell a component about a peer is to use a static flag.

### Thanos Query
### Thanos Querier

The repeatable flag `--store=<store>` can be used to specify a `StoreAPI` that `Thanos Query` should use.
The repeatable flag `--store=<store>` can be used to specify a `StoreAPI` that `Thanos Querier` should use.

### Thanos Rule
### Thanos Ruler

`Thanos Rule` supports the configuration of `QueryAPI` endpoints using YAML with the `--query.config=<content>` and `--query.config-file=<path>` flags in the `static_configs` section.
`Thanos Ruler` supports the configuration of `QueryAPI` endpoints using YAML with the `--query.config=<content>` and `--query.config-file=<path>` flags in the `static_configs` section.

`Thanos Rule` also supports the configuration of Alertmanager endpoints using YAML with the `--alertmanagers.config=<content>` and `--alertmanagers.config-file=<path>` flags in the `static_configs` section.
`Thanos Ruler` also supports the configuration of Alertmanager endpoints using YAML with the `--alertmanagers.config=<content>` and `--alertmanagers.config-file=<path>` flags in the `static_configs` section.

## File Service Discovery

Expand Down Expand Up @@ -62,18 +62,18 @@ Both YAML and JSON files can be used. The format of the files is as follows:
As a fallback, the file contents are periodically re-read at an interval that can be set using a flag specific to the component as shown below.
The default value for all File SD re-read intervals is 5 minutes.
### Thanos Query
### Thanos Querier
The repeatable flag `--store.sd-files=<path>` can be used to specify the path to files that contain addresses of `StoreAPI` servers.
The `<path>` can be a glob pattern so you can specify several files using a single flag.

The flag `--store.sd-interval=<5m>` can be used to change the fallback re-read interval from the default 5 minutes.

### Thanos Rule
### Thanos Ruler

`Thanos Rule` supports the configuration of `QueryAPI` endpoints using YAML with the `--query.config=<content>` and `--query.config-file=<path>` flags in the `file_sd_configs` section.
`Thanos Ruler` supports the configuration of `QueryAPI` endpoints using YAML with the `--query.config=<content>` and `--query.config-file=<path>` flags in the `file_sd_configs` section.

`Thanos Rule` also supports the configuration of Alertmanager endpoints using YAML with the `--alertmanagers.config=<content>` and `--alertmanagers.config-file=<path>` flags in the `file_sd_configs` section.
`Thanos Ruler` also supports the configuration of Alertmanager endpoints using YAML with the `--alertmanagers.config=<content>` and `--alertmanagers.config-file=<path>` flags in the `file_sd_configs` section.

## DNS Service Discovery

Expand Down Expand Up @@ -109,7 +109,7 @@ This configuration will instruct Thanos to discover all endpoints within the `th
```
The default interval between DNS lookups is 30s. This interval can be changed using the `store.sd-dns-interval` flag for `StoreAPI`
configuration in `Thanos Query`, or `query.sd-dns-interval` for `QueryAPI` configuration in `Thanos Rule`.
configuration in `Thanos Querier`, or `query.sd-dns-interval` for `QueryAPI` configuration in `Thanos Ruler`.
## Other
Expand Down
16 changes: 8 additions & 8 deletions examples/alerts/alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ rules:
annotations:
description: Thanos Compact {{$labels.job}} has failed to run and now is halted.
summary: Thanos Compact has failed to run ans is now halted.
expr: thanos_compactor_halted{job=~"thanos-compact.*"} == 1
expr: thanos_compact_halted{job=~"thanos-compact.*"} == 1
for: 5m
labels:
severity: warning
Expand Down Expand Up @@ -67,7 +67,7 @@ rules:
## Ruler
For Thanos ruler we run some alerts in local Prometheus, to make sure that Thanos Rule is working:
For Thanos Ruler we run some alerts in local Prometheus, to make sure that Thanos Ruler is working:
[embedmd]:# (../tmp/thanos-rule.rules.yaml yaml)
```yaml
Expand Down Expand Up @@ -162,9 +162,9 @@ rules:
summary: Thanos Rule is having high number of DNS failures.
expr: |
(
sum by (job) (rate(thanos_ruler_query_apis_dns_failures_total{job=~"thanos-rule.*"}[5m]))
sum by (job) (rate(thanos_rule_query_apis_dns_failures_total{job=~"thanos-rule.*"}[5m]))
/
sum by (job) (rate(thanos_ruler_query_apis_dns_lookups_total{job=~"thanos-rule.*"}[5m]))
sum by (job) (rate(thanos_rule_query_apis_dns_lookups_total{job=~"thanos-rule.*"}[5m]))
* 100 > 1
)
for: 15m
Expand All @@ -177,9 +177,9 @@ rules:
summary: Thanos Rule is having high number of DNS failures.
expr: |
(
sum by (job) (rate(thanos_ruler_alertmanagers_dns_failures_total{job=~"thanos-rule.*"}[5m]))
sum by (job) (rate(thanos_rule_alertmanagers_dns_failures_total{job=~"thanos-rule.*"}[5m]))
/
sum by (job) (rate(thanos_ruler_alertmanagers_dns_lookups_total{job=~"thanos-rule.*"}[5m]))
sum by (job) (rate(thanos_rule_alertmanagers_dns_lookups_total{job=~"thanos-rule.*"}[5m]))
* 100 > 1
)
for: 15m
Expand Down Expand Up @@ -374,9 +374,9 @@ rules:
summary: Thanos Query is having high number of DNS failures.
expr: |
(
sum by (job) (rate(thanos_querier_store_apis_dns_failures_total{job=~"thanos-query.*"}[5m]))
sum by (job) (rate(thanos_query_store_apis_dns_failures_total{job=~"thanos-query.*"}[5m]))
/
sum by (job) (rate(thanos_querier_store_apis_dns_lookups_total{job=~"thanos-query.*"}[5m]))
sum by (job) (rate(thanos_query_store_apis_dns_lookups_total{job=~"thanos-query.*"}[5m]))
) * 100 > 1
for: 15m
labels:
Expand Down
Loading

0 comments on commit 6889ae8

Please sign in to comment.