Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pattern-ingester msg="unknown log level while observing stream" level=informational severity=informational (messages from promtail syslog) #14213

Closed
candlerb opened this issue Sep 21, 2024 · 9 comments · Fixed by #14255

Comments

@candlerb
Copy link
Contributor

candlerb commented Sep 21, 2024

(I'm not sure if this is user error, but I am raising in bug format anyway just to capture the details)

Describe the bug
I am trying out loki 3.2.0 and attempting to use the explore logs / pattern ingester functionality.

I am getting a very large volume of these messages logged to loki's stdout, which I wasn't before:

Sep 21 20:07:22 loki loki-linux-amd64[5197]: level=warn ts=2024-09-21T20:07:22.76097582Z caller=instance.go:274 component=pattern-ingester msg="unknown log level while observing stream" level=informational stream="{app=\"loki-linux-amd64\", facility=\"daemon\", host=\"loki\", ip=\"127.0.0.1\", service_name=\"loki-linux-amd64\", severity=\"informational\"}"
Sep 21 20:07:22 loki loki-linux-amd64[5197]: level=warn ts=2024-09-21T20:07:22.760986596Z caller=instance.go:274 component=pattern-ingester msg="unknown log level while observing stream" level=informational stream="{app=\"loki-linux-amd64\", facility=\"daemon\", host=\"loki\", ip=\"127.0.0.1\", service_name=\"loki-linux-amd64\", severity=\"informational\"}"
Sep 21 20:07:22 loki loki-linux-amd64[5197]: level=warn ts=2024-09-21T20:07:22.760997349Z caller=instance.go:274 component=pattern-ingester msg="unknown log level while observing stream" level=informational stream="{app=\"loki-linux-amd64\", facility=\"daemon\", host=\"loki\", ip=\"127.0.0.1\", service_name=\"loki-linux-amd64\", severity=\"informational\"}"
Sep 21 20:07:22 loki loki-linux-amd64[5197]: level=warn ts=2024-09-21T20:07:22.761009328Z caller=instance.go:274 component=pattern-ingester msg="unknown log level while observing stream" level=informational stream="{app=\"loki-linux-amd64\", facility=\"daemon\", host=\"loki\", ip=\"127.0.0.1\", service_name=\"loki-linux-amd64\", severity=\"informational\"}"
Sep 21 20:07:22 loki loki-linux-amd64[5197]: level=warn ts=2024-09-21T20:07:22.761020505Z caller=instance.go:274 component=pattern-ingester msg="unknown log level while observing stream" level=informational stream="{app=\"loki-linux-amd64\", facility=\"daemon\", host=\"loki\", ip=\"127.0.0.1\", service_name=\"loki-linux-amd64\", severity=\"informational\"}"

I have promtail (2.9.1) receiving syslog messages and forwarding them to loki, and I believe this is where severity="informational" is coming from - see syslog_message.go

However the fact that these messages have {app="loki-linux-amd64",service_name="loki-linux-amd64",severity="informational"} suggests that either there is a loop (loki is eating its own messges), or loki is generating them with severity="informational", even though earlier on the line says level=warn. I don't know enough about the provenance of these labels to diagnose further.

It seems to me that:

  1. pattern-ingester should accept the severity levels which promtail generates from parsing syslog priority, including "informational" and "warning". Possibly relates to Improve Level detection #12645. (You could argue that promtail should generate "info" rather than "informational", but that would be a backwards-incompatible change);
  2. pattern-ingester could be more forgiving about the severity/level names it receives; and/or
  3. pattern-ingester could rate-limit the warnings it sends about unknown log levels; and/or
  4. pattern-ingester's warning log messages could be clearer, as they currently give little information about the original message which triggered the warning. (However I can search using logcli query '{severity="informational",app!="loki-linux-amd64"}' --since=1h --limit=10)

I note that the Grafana dashboard Explore Logs plugin (which is the whole point of this exercise) appears to be happy with "informational" as a level:

Image

To Reproduce

  • Ubuntu 18.04 container (I only just found out it was that old - yes I need to upgrade it)
  • loki-linux-amd64 3.2.0
  • promtail-linux-amd64 2.9.1

/etc/loki/loki.yaml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: debug
  grpc_server_max_concurrent_streams: 1000

common:
  instance_addr: 127.0.0.1
  path_prefix: /var/lib/loki
  storage:
    filesystem:
      chunks_directory: /var/lib/loki/chunks
      rules_directory: /var/lib/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

ingester_rf1:
  enabled: false

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
    - from: 2023-09-15
      store: tsdb
      object_store: filesystem
      schema: v12
      index:
        period: 24h
        prefix: index_
    - from: 2024-09-21
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

pattern_ingester:
  enabled: true
  metric_aggregation:
    enabled: true
    loki_address: localhost:3100

ruler:
  alertmanager_url: http://prometheus:9093/alertmanager

frontend:
  encoding: protobuf

compactor:
  retention_enabled: true
  delete_request_store: filesystem

limits_config:
  retention_period: 91d
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  # https://notes.nicolevanderhoeven.com/How+to+enable+Explore+Logs+for+Loki
  allow_structured_metadata: true
  volume_enabled: true

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/analytics/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
analytics:
  reporting_enabled: false

/etc/systemd/system/loki.service

[Unit]
Description=Loki logging server
Documentation=https://grafana.com/oss/loki/
After=network-online.target

[Service]
User=syslog
Group=syslog
Restart=on-failure
RestartSec=60
WorkingDirectory=/var/lib/loki
EnvironmentFile=/etc/default/loki
ExecStart=/opt/loki/loki-linux-amd64 $OPTIONS

[Install]
WantedBy=multi-user.target

/etc/default/loki

OPTIONS='-config.file=/etc/loki/loki.yaml -log.level=warn'

/etc/loki/promtail.yaml

server:
  http_listen_address: 127.0.0.1
  http_listen_port: 9080
  grpc_listen_port: 0

# How to find the loki server(s)
clients:
  - url: http://127.0.0.1:3100/loki/api/v1/push

# Logs to read and forward
scrape_configs:
  - job_name: syslog
    syslog:
      listen_address: 127.0.0.1:5140
    relabel_configs:
      - source_labels: [__syslog_message_severity]
        target_label: severity
      - source_labels: [__syslog_message_facility]
        target_label: facility
      - source_labels: [__syslog_message_hostname]
        regex: '([^/]*)'
        target_label: host
      - source_labels: [__syslog_message_hostname]
        regex: '(.*)/(.*)'
        replacement: '$1'
        target_label: host
      - source_labels: [__syslog_message_hostname]
        regex: '(.*)/(.*)'
        replacement: '$2'
        target_label: ip
      - source_labels: [__syslog_message_app_name]
        target_label: app

/etc/systemd/system/promtail.service

[Unit]
Description=Promtail
Documentation=https://github.com/grafana/loki/tree/master/docs/clients/promtail
After=network-online.target
# Fix shutdown delays: if loki is running on the same host,
# Loki should start first, and shutdown after.
After=loki.service

[Service]
User=syslog
Group=syslog
Restart=on-failure
RestartSec=5
WorkingDirectory=/var/log
EnvironmentFile=/etc/default/promtail
ExecStart=/opt/loki/promtail-linux-amd64 $OPTIONS

[Install]
WantedBy=multi-user.target

/etc/default/promtail

OPTIONS='-config.file=/etc/loki/promtail.yaml'

/etc/rsyslog.d/25-promtail.conf

module(load="mmutf8fix")
action(type="mmutf8fix" replacementChar="?")

template(name="Custom_SyslogProtocol23Format" type="string"
     string="<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME%/%FROMHOST-IP% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%\n")

*.*     action(type="omfwd" protocol="tcp"
               target="127.0.0.1" port="5140"
               Template="Custom_SyslogProtocol23Format"
               TCP_Framing="octet-counted")

Expected behavior
Messages to be processed successfully

Environment:

  • Infrastructure: incus container
  • Deployment tool: just running the release binaries directly
# /opt/loki/loki-linux-amd64 --version
loki, version 3.2.0 (branch: k218, revision: 659f5421)
  build user:       root@003ce357cdf4
  build date:       2024-09-18T16:21:52Z
  go version:       go1.22.6
  platform:         linux/amd64
  tags:             netgo
@candlerb
Copy link
Contributor Author

I found this in the source code at pkg/pattern/instance.go:

                if _, ok := streamMetrics[lvl]; !ok {
                        level.Warn(i.logger).Log(
                                "msg", "unknown log level while observing stream",
                                "level", lvl,
                                "stream", stream,
                        )

                        lvl = constants.LogLevelUnknown
                }

Clearly, large numbers of these messages could be generated. Simply logging them at "Debug" level might be a simple solution. Keeping a map of unknown levels, and sending a warning only once for each one seen, might be another. (With an overall size limit on the map)

@candlerb
Copy link
Contributor Author

candlerb commented Sep 21, 2024

According to grafana, the volume of logs generated per minute is growing... I will need to turn this pattern ingester off. (EDIT: it's happy after doing that)

Image

@svennergr
Copy link
Contributor

@trevorwhitney is there a reason why we only would support those configured log levels: https://github.com/grafana/loki/blob/913e9f93477b5b811fbcf44d0e750f600c9ded69/pkg/util/constants/levels.go

@trevorwhitney
Copy link
Collaborator

The reason for hardcoding the levels we support was for performance.

We read the level from structured metadata to avoid re-parsing / re-detecting log level. I made the mistake in reading this code that we'd only be populating structured metadata with one of these constant levels. In the case we get it from a label it looks like it could be anything.

The performance gain was just to pre-allocate the map, so I can change that.

@trevorwhitney
Copy link
Collaborator

This should fix it: #14255

@candlerb
Copy link
Contributor Author

Thank you.

On my side, I realised I should also ensure that there's no chance loki can ever ingest its own logs. I'm thinking of running loki and promtail in separate containers, so that rsyslog in the promtail container never receives any journald or /dev/log messages from loki.

@carlopalacio
Copy link

I've been keeping track of this bug, has this already been merged into Loki 3.2.1 public docker image?

@StefanSa
Copy link

@carlopalacio
apparently not, have the same problems with the Docker image v3.2.1

@trevorwhitney
Copy link
Collaborator

trevorwhitney commented Nov 18, 2024

It will definitely be in 3.3: #14750, which is planned to go out tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants