Skip to content

Commit

Permalink
Manual backport for #11843
Browse files Browse the repository at this point in the history
  • Loading branch information
JStickler committed Jan 31, 2024
1 parent d32d62a commit 350edd2
Showing 1 changed file with 28 additions and 15 deletions.
43 changes: 28 additions & 15 deletions docs/sources/get-started/labels/structured-metadata.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
menuTitle: Structured metadata
title: What is structured metadata
description: Attaching metadata to logs.
description: Describes how to enable structure metadata for logs and how to query using structured metadata to filter log lines.
---
# What is structured metadata

Expand All @@ -13,44 +13,57 @@ Structured metadata is an experimental feature and is subject to change in futur
Structured metadata was added to chunk format V4 which is used if the schema version is greater or equal to `13`. (See [Schema Config]({{< relref "../../storage#schema-config" >}}) for more details about schema versions. )
{{% /admonition %}}

One of the powerful features of Loki is parsing logs at query time to extract metadata and build labels out of it.
However, the parsing of logs at query time comes with a cost which can be significantly high for, as an example,
large json blobs or a poorly written query using complex regex patterns.
Selecting proper, low cardinality labels is critical to operating and querying Loki effectively. Some metadata, especially infrastructure related metadata, can be difficult to embed in log lines, and is too high cardinality to effectively store as indexed labels (and therefore reducing performance of the index).

In addition, the data extracted from logs at query time is usually high cardinality, which can’t be stored
in the index as it would increase the cardinality too much, and therefore reduce the performance of the index.

Structured metadata is a way to attach metadata to logs without indexing them. Examples of useful metadata are
trace IDs, user IDs, and any other label that is often used in queries but has high cardinality and is expensive
Structured metadata is a way to attach metadata to logs without indexing them or including them in the log line content itself. Examples of useful metadata are
kubernetes pod names, process ID's, or any other label that is often used in queries but has high cardinality and is expensive
to extract at query time.

Structured metadata can also be used to query commonly needed metadata from log lines without needing to apply a parser at query time. Large json blobs or a poorly written query using complex regex patterns, for example, come with a high performance cost. Examples of useful metadata include trace IDs or user IDs.


## Attaching structured metadata to log lines

You have the option to attach structured metadata to log lines in the push payload along with each log line and the timestamp.
For more information on how to push logs to Loki via the HTTP endpoint, refer to the [HTTP API documentation]({{< relref "../../reference/api#push-log-entries-to-loki" >}}).
For more information on how to push logs to Loki via the HTTP endpoint, refer to the [HTTP API documentation]({{< relref "../../reference/api#ingest-logs" >}}).

Check failure on line 28 in docs/sources/get-started/labels/structured-metadata.md

View workflow job for this annotation

GitHub Actions / doc-validator

[doc-validator] reported by reviewdog 🐶 The anchor 'ingest-logs' does not exist in the file 'docs/sources/reference/api.md'. Replace the anchor with one of the available anchors. Available anchors are: "grafana-loki-http-api, microservices-mode, matrix-vector-and-streams, timestamp-formats, query-loki, examples, query-loki-over-a-range-of-time, step-versus-interval, examples-1, list-labels-within-a-range-of-time, examples-2, list-label-values-within-a-range-of-time, examples-3, stream-log-messages, push-log-entries-to-loki, examples-4, identify-ready-loki-instance, change-log-level-at-runtime, flush-in-memory-chunks-to-backing-store, tell-ingester-to-release-all-resources-on-next-sigterm, flush-in-memory-chunks-and-shut-down, display-distributor-consistent-hash-ring-status, return-exposed-prometheus-metrics, list-current-configuration, list-running-services, list-build-information, format-query, list-series, examples-5, index-stats, volume, statistics, ruler, ruler-ring-status, list-rule-groups, example-response, get-rule-groups-by-namespace, example-response-1, get-rule-group, set-rule-group, example-request, delete-rule-group, delete-namespace, list-rules, list-alerts, compactor, compactor-ring-status, request-log-deletion, examples-6, list-log-deletion-requests, examples-7, request-cancellation-of-a-delete-request, examples-8, deprecated-endpoints, get-apipromtail, get-apipromquery, examples-9, get-apipromlabelnamevalues, examples-10, get-apipromlabel, examples-11, post-apiprompush, examples-12, post-ingesterflush_shutdown" Raw Output: {"message":"The anchor 'ingest-logs' does not exist in the file 'docs/sources/reference/api.md'.\nReplace the anchor with one of the available anchors.\n\nAvailable anchors are: \"grafana-loki-http-api, microservices-mode, matrix-vector-and-streams, timestamp-formats, query-loki, examples, query-loki-over-a-range-of-time, step-versus-interval, examples-1, list-labels-within-a-range-of-time, examples-2, list-label-values-within-a-range-of-time, examples-3, stream-log-messages, push-log-entries-to-loki, examples-4, identify-ready-loki-instance, change-log-level-at-runtime, flush-in-memory-chunks-to-backing-store, tell-ingester-to-release-all-resources-on-next-sigterm, flush-in-memory-chunks-and-shut-down, display-distributor-consistent-hash-ring-status, return-exposed-prometheus-metrics, list-current-configuration, list-running-services, list-build-information, format-query, list-series, examples-5, index-stats, volume, statistics, ruler, ruler-ring-status, list-rule-groups, example-response, get-rule-groups-by-namespace, example-response-1, get-rule-group, set-rule-group, example-request, delete-rule-group, delete-namespace, list-rules, list-alerts, compactor, compactor-ring-status, request-log-deletion, examples-6, list-log-deletion-requests, examples-7, request-cancellation-of-a-delete-request, examples-8, deprecated-endpoints, get-apipromtail, get-apipromquery, examples-9, get-apipromlabelnamevalues, examples-10, get-apipromlabel, examples-11, post-apiprompush, examples-12, post-ingesterflush_shutdown\"","location":{"path":"docs/sources/get-started/labels/structured-metadata.md","range":{"start":{"line":28,"column":86},"end":{"line":28,"column":160}}},"severity":1,"source":{"name":"doc-validator"},"code":{"value":"anchor-does-not-exist"}}

Alternatively, you can use the Grafana Agent or Promtail to extract and attach structured metadata to your log lines.
See the [Promtail: Structured metadata stage]({{< relref "../../send-data/promtail/stages/structured_metadata" >}}) for more information.

With Loki version 1.2.0, support for structured metadata has been added to the Logstash output plugin. For more information, see [logstash]({{< relref "../../send-data/logstash/_index.md" >}}).


{{% admonition type="warning" %}}
There are defaults for how much structured metadata can be attached per log line.
```
# Maximum size accepted for structured metadata per log line.
# CLI flag: -limits.max-structured-metadata-size
[max_structured_metadata_size: <int> | default = 64KB]
# Maximum number of structured metadata entries per log line.
# CLI flag: -limits.max-structured-metadata-entries-count
[max_structured_metadata_entries_count: <int> | default = 128]
```
{{% /admonition %}}

## Querying structured metadata

Structured metadata is extracted automatically for each returned log line and added to the labels returned for the query.
You can use labels of structured metadata to filter log line using a [label filter expression]({{< relref "../../query/log_queries#label-filter-expression" >}}).

For example, if you have a label `trace_id` attached to some of your log lines as structured metadata, you can filter log lines using:
For example, if you have a label `pod` attached to some of your log lines as structured metadata, you can filter log lines using:

```logql
{job="example"} | trace_id="0242ac120002"`
{job="example"} | pod="myservice-abc1234-56789"`
```

Of course, you can filter by multiple labels of structured metadata at the same time:

```logql
{job="example"} | trace_id="0242ac120002" | user_id="superUser123"
{job="example"} | pod="myservice-abc1234-56789" | trace_id="0242ac120002"
```

Note that since structured metadata is extracted automatically to the results labels, some metric queries might return
an error like `maximum of series (50000) reached for a single query`. You can use the [Keep]({{< relref "../../query/log_queries#keep-labels-expression" >}}) and [Drop]({{< relref "../../query/log_queries#drop-labels-expression" >}}) stages to filter out labels that you don't need.
Note that since structured metadata is extracted automatically to the results labels, some metric queries might return an error like `maximum of series (50000) reached for a single query`. You can use the [Keep]({{< relref "../../query/log_queries#keep-labels-expression" >}}) and [Drop]({{< relref "../../query/log_queries#drop-labels-expression" >}}) stages to filter out labels that you don't need.
For example:

```logql
Expand Down

0 comments on commit 350edd2

Please sign in to comment.