From bd60454f0274424e66cd4fa067efcff8eec1fbc9 Mon Sep 17 00:00:00 2001 From: J Stickler Date: Thu, 21 Nov 2024 16:42:54 -0500 Subject: [PATCH] docs: Clarifying info about structured metadata, blooms (#15058) (cherry picked from commit e866c2ff570b1ea0c39abae4b31a519b307b6376) --- .../get-started/labels/structured-metadata.md | 4 ++-- docs/sources/operations/bloom-filters.md | 15 ++++++++------- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/sources/get-started/labels/structured-metadata.md b/docs/sources/get-started/labels/structured-metadata.md index 506d64eb567a3..f51bda6291201 100644 --- a/docs/sources/get-started/labels/structured-metadata.md +++ b/docs/sources/get-started/labels/structured-metadata.md @@ -23,8 +23,8 @@ You should only use structured metadata in the following situations: - If you are ingesting data in OpenTelemetry format, using Grafana Alloy or an OpenTelemetry Collector. Structured metadata was designed to support native ingestion of OpenTelemetry data. - If you have high cardinality metadata that should not be used as a label and does not exist in the log line. Some examples might include `process_id` or `thread_id` or Kubernetes pod names. -- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs. -- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki//operations/bloom-filters/) +- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs. You must set `discover_log_levels` and `allow_structured_metadata` to `true` in your Loki configuration. +- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki//operations/bloom-filters/) (Experimental), starting in [Loki 3.3](https://grafana.com/docs/loki//release-notes/v3-3/) Bloom filters now utilize structured metadata. We do not recommend extracting information that already exists in your log lines and putting it into structured metadata. diff --git a/docs/sources/operations/bloom-filters.md b/docs/sources/operations/bloom-filters.md index a62e67efbc900..4704adc8e1d99 100644 --- a/docs/sources/operations/bloom-filters.md +++ b/docs/sources/operations/bloom-filters.md @@ -12,9 +12,10 @@ aliases: # Bloom filters (Experimental) -{{% admonition type="warning" %}} -This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided. -{{% /admonition %}} +{{< admonition type="warning" >}} +This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided. +Note that this feature is intended for users who are ingesting more than 75TB of logs a month, as it is designed to accelerate queries against large volumes of logs. +{{< /admonition >}} Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through. Loki is often used to run "needle in a haystack" queries; these are queries where a large number of log lines are searched, but only a few log lines match the query. @@ -110,7 +111,7 @@ overrides: period: 40d ``` -### Sizing and configuration +### Planner and Builder sizing and configuration The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue. Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the number of tenants, and the log volume of the streams. @@ -131,7 +132,7 @@ The sharding of the data is performed on the client side using DNS discovery of You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg]. Refer to the [Enable bloom filters](#enable-bloom-filters) section above for a configuration snippet enabling this feature. -### Sizing and configuration +### Gateway sizing and configuration Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage. The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate. @@ -140,7 +141,7 @@ With default settings, bloom filters make up <1% of the raw structured metadata Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, locally attached SSD disks (NVMe) to increase I/O throughput. Multiple directories on different disk mounts can be specified using the `-bloom.shipper.working-directory` [setting][storage-config-cfg] when using a comma separated list of mount points, for example: -``` +```yaml -bloom.shipper.working-directory="/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3" ``` @@ -150,7 +151,7 @@ The product of three settings control the maximum amount of bloom data in memory Example, assuming 4 CPU cores: -``` +```yaml -bloom-gateway.worker-concurrency=4 // 1x NUM_CORES -bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES -bloom.max-query-page-size=64MiB