Skip to content

Commit

Permalink
docs: Update components.md #fix some typos (grafana#14612)
Browse files Browse the repository at this point in the history
Co-authored-by: Christian Haudum <[email protected]>
Co-authored-by: J Stickler <[email protected]>
  • Loading branch information
3 people authored Nov 27, 2024
1 parent defe3af commit a412009
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions docs/sources/get-started/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ and to ensure that it is within the configured tenant (or global) limits. Each v
is then sent to `n` [ingesters](#ingester) in parallel, where `n` is the [replication factor](#replication-factor) for data.
The distributor determines the ingesters to which it sends a stream to using [consistent hashing](#hashing).

It is important that a load balancer sits in front of the distributor in order to properly balance incoming traffic to them.
In Kubernetes the service load balancer provides this service.
A load balancer must sit in front of the distributor to properly balance incoming traffic to them.
In Kubernetes, the service load balancer provides this service.

The distributor is a stateless component. This makes it easy to scale and offload as much work as possible from the ingesters, which are the most critical component on the write path.
The ability to independently scale these validation operations mean that Loki can also protect itself against denial of service attacks that could otherwise overload the ingesters.
The ability to independently scale these validation operations means that Loki can also protect itself against denial of service attacks that could otherwise overload the ingesters.
It also allows us to fan-out writes according to the [replication factor](#replication-factor).

### Validation
Expand All @@ -53,11 +53,11 @@ The first step the distributor takes is to ensure that all incoming data is acco

### Preprocessing

Currently the only way the distributor mutates incoming data is by normalizing labels. What this means is making `{foo="bar", bazz="buzz"}` equivalent to `{bazz="buzz", foo="bar"}`, or in other words, sorting the labels. This allows Loki to cache and hash them deterministically.
Currently, the only way the distributor mutates incoming data is by normalizing labels. What this means is making `{foo="bar", bazz="buzz"}` equivalent to `{bazz="buzz", foo="bar"}`, or in other words, sorting the labels. This allows Loki to cache and hash them deterministically.

### Rate limiting

The distributor can also rate limit incoming logs based on the maximum data ingest rate per tenant. It does this by checking a per-tenant limit and dividing it by the current number of distributors. This allows the rate limit to be specified per tenant at the cluster level and enables us to scale the distributors up or down and have the per-distributor limit adjust accordingly. For instance, say we have 10 distributors and tenant A has a 10MB rate limit. Each distributor will allow up to 1MB/s before limiting. Now, say another large tenant joins the cluster and we need to spin up 10 more distributors. The now 20 distributors will adjust their rate limits for tenant A to `(10MB / 20 distributors) = 500KB/s`. This is how global limits allow much simpler and safer operation of the Loki cluster.
The distributor can also rate-limit incoming logs based on the maximum data ingest rate per tenant. It does this by checking a per-tenant limit and dividing it by the current number of distributors. This allows the rate limit to be specified per tenant at the cluster level and enables us to scale the distributors up or down and have the per-distributor limit adjust accordingly. For instance, say we have 10 distributors and tenant A has a 10MB rate limit. Each distributor will allow up to 1MB/s before limiting. Now, say another large tenant joins the cluster and we need to spin up 10 more distributors. The now 20 distributors will adjust their rate limits for tenant A to `(10MB / 20 distributors) = 500KB/s`. This is how global limits allow much simpler and safer operation of the Loki cluster.

{{< admonition type="note" >}}
The distributor uses the `ring` component under the hood to register itself amongst its peers and get the total number of active distributors. This is a different "key" than the ingesters use in the ring and comes from the distributor's own [ring configuration](https://grafana.com/docs/loki/<LOKI_VERSION>/configure/#distributor).
Expand All @@ -69,13 +69,13 @@ Once the distributor has performed all of its validation duties, it forwards dat

#### Replication factor

In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication factor_ of them. Generally, the replication factor is `3`. Replication allows for ingester restarts and rollouts without failing writes and adds additional protection from data loss for some scenarios. Loosely, for each label set (called a _stream_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will generate an error if less than a _quorum_ of writes succeed. A quorum is defined as `floor( replication_factor / 2 ) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write operation will be retried.
In order to mitigate the chance of _losing_ data on any single ingester, the distributor will forward writes to a _replication factor_ of them. Generally, the replication factor is `3`. Replication allows for ingester restarts and rollouts without failing writes and adds additional protection from data loss for some scenarios. Loosely, for each label set (called a _stream_) that is pushed to a distributor, it will hash the labels and use the resulting value to look up `replication_factor` ingesters in the `ring` (which is a subcomponent that exposes a [distributed hash table](https://en.wikipedia.org/wiki/Distributed_hash_table)). It will then try to write the same data to all of them. This will generate an error if less than a _quorum_ of writes succeeds. A quorum is defined as `floor( replication_factor / 2 ) + 1`. So, for our `replication_factor` of `3`, we require that two writes succeed. If less than two writes succeed, the distributor returns an error and the write operation will be retried.

{{< admonition type="caution" >}}
If a write is acknowledged by 2 out of 3 ingesters, we can tolerate the loss of one ingester but not two, as this would result in data loss.
{{< /admonition >}}

The replication factor is not the only thing that prevents data loss, though, and its main purpose is to allow writes to continue uninterrupted during rollouts and restarts. The [ingester component](#ingester) now includes a [write ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging) (WAL) which persists incoming writes to disk to ensure they are not lost as long as the disk isn't corrupted. The complementary nature of replication factor and WAL ensures data isn't lost unless there are significant failures in both mechanisms (that is, multiple ingesters die and lose/corrupt their disks).
The replication factor is not the only thing that prevents data loss, though, and its main purpose is to allow writes to continue uninterrupted during rollouts and restarts. The [ingester component](#ingester) now includes a [write ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging) (WAL) which persists incoming writes to disk to ensures they are not lost as long as the disk isn't corrupted. The complementary nature of the replication factor and WAL ensures data isn't lost unless there are significant failures in both mechanisms (that is, multiple ingesters die and lose/corrupt their disks).

### Hashing

Expand All @@ -102,7 +102,7 @@ value is larger than the hash of the stream. When the replication factor is
larger than 1, the next subsequent tokens (clockwise in the ring) that belong to
different ingesters will also be included in the result.

The effect of this hash set up is that each token that an ingester owns is
The effect of this hash setup is that each token that an ingester owns is
responsible for a range of hashes. If there are three tokens with values 0, 25,
and 50, then a hash of 3 would be given to the ingester that owns the token 25;
the ingester owning token 25 is responsible for the hash range of 1-25.
Expand Down Expand Up @@ -133,7 +133,7 @@ the hash ring. Each ingester has a state of either `PENDING`, `JOINING`,
another ingester that is `LEAVING`. This only applies for legacy deployment modes.

{{< admonition type="note" >}}
Handoff is deprecated behavior mainly used in stateless deployments of ingesters, which is discouraged. Instead, it's recommended using a stateful deployment model together with the [write ahead log]({{< relref "../operations/storage/wal" >}}).
Handoff is a deprecated behavior mainly used in stateless deployments of ingesters, which is discouraged. Instead, it's recommended using a stateful deployment model together with the [write ahead log]({{< relref "../operations/storage/wal" >}}).
{{< /admonition >}}

1. `JOINING` is an Ingester's state when it is currently inserting its tokens
Expand Down Expand Up @@ -263,9 +263,9 @@ The query frontend supports caching metric query results and reuses them on subs

The query frontend also supports caching of log queries in form of a negative cache.
This means that instead of caching the log results for quantized time ranges, Loki only caches empty results for quantized time ranges.
This is more efficient than caching actual results, because log queries are limited (usually 1000 results)
This is more efficient than caching actual results because log queries are limited (usually 1000 results)
and if you have a query over a long time range that matches only a few lines, and you only cache actual results,
you'd still need to process a lot of data additionally to the data from the results cache in order to verify that nothing else matches.
you'd still need to process a lot of data in addition to the data from the results cache in order to verify that nothing else matches.

#### Index stats queries

Expand Down

0 comments on commit a412009

Please sign in to comment.