Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Clarifying info about structured metadata, blooms (#15058) #15061

Merged
merged 1 commit into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/sources/get-started/labels/structured-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ You should only use structured metadata in the following situations:

- If you are ingesting data in OpenTelemetry format, using Grafana Alloy or an OpenTelemetry Collector. Structured metadata was designed to support native ingestion of OpenTelemetry data.
- If you have high cardinality metadata that should not be used as a label and does not exist in the log line. Some examples might include `process_id` or `thread_id` or Kubernetes pod names.
- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs.
- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki/<LOKI_VERSION>/operations/bloom-filters/)
- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs. You must set `discover_log_levels` and `allow_structured_metadata` to `true` in your Loki configuration.
- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki/<LOKI_VERSION>/operations/bloom-filters/) (Experimental), starting in [Loki 3.3](https://grafana.com/docs/loki/<LOKI_VERSION>/release-notes/v3-3/) Bloom filters now utilize structured metadata.

We do not recommend extracting information that already exists in your log lines and putting it into structured metadata.

Expand Down
15 changes: 8 additions & 7 deletions docs/sources/operations/bloom-filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ aliases:

# Bloom filters (Experimental)

{{% admonition type="warning" %}}
This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
{{% /admonition %}}
{{< admonition type="warning" >}}
This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
Note that this feature is intended for users who are ingesting more than 75TB of logs a month, as it is designed to accelerate queries against large volumes of logs.
{{< /admonition >}}

Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
Loki is often used to run "needle in a haystack" queries; these are queries where a large number of log lines are searched, but only a few log lines match the query.
Expand Down Expand Up @@ -110,7 +111,7 @@ overrides:
period: 40d
```

### Sizing and configuration
### Planner and Builder sizing and configuration

The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the number of tenants, and the log volume of the streams.
Expand All @@ -131,7 +132,7 @@ The sharding of the data is performed on the client side using DNS discovery of
You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
Refer to the [Enable bloom filters](#enable-bloom-filters) section above for a configuration snippet enabling this feature.

### Sizing and configuration
### Gateway sizing and configuration

Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage.
The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate.
Expand All @@ -140,7 +141,7 @@ With default settings, bloom filters make up <1% of the raw structured metadata
Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, locally attached SSD disks (NVMe) to increase I/O throughput.
Multiple directories on different disk mounts can be specified using the `-bloom.shipper.working-directory` [setting][storage-config-cfg] when using a comma separated list of mount points, for example:

```
```yaml
-bloom.shipper.working-directory="/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3"
```

Expand All @@ -150,7 +151,7 @@ The product of three settings control the maximum amount of bloom data in memory

Example, assuming 4 CPU cores:

```
```yaml
-bloom-gateway.worker-concurrency=4 // 1x NUM_CORES
-bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES
-bloom.max-query-page-size=64MiB
Expand Down
Loading