Skip to content

Commit

Permalink
add container parser blogpost
Browse files Browse the repository at this point in the history
Signed-off-by: ChrsMark <[email protected]>
  • Loading branch information
ChrsMark committed May 15, 2024
1 parent 6e35a94 commit d08fbe5
Showing 1 changed file with 151 additions and 0 deletions.
151 changes: 151 additions & 0 deletions content/en/blog/2024/otel-collector-container-log-parser/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
title: Introducing the new container log parser for OpenTelemetry Collector
linkTitle: OTel Collector container log parser
date: 2024-05-16
# prettier-ignore
cSpell:ignore:
author: '[Christos Markou](https://github.com/ChrsMark) (Elastic)'

Check warning on line 7 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Christos)

Check warning on line 7 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Markou)
---

Filelog receiver is one of the most commonly used components of the OpenTelemetry Collector, as indicated

Check warning on line 10 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Filelog)
by the most recent [survey](https://opentelemetry.io/blog/2024/otel-collector-survey/#otel-components-usage).
According to the same survey, it's unsurprising that [Kubernetes is the leading platform for Collector deployment (80.6%)](https://opentelemetry.io/blog/2024/otel-collector-survey/#deployment-scale-and-environment).
Based on these two facts, we can realise the importance of seamless log collection on Kubernetes environments.

Check warning on line 13 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (realise)

Currently, the [filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/receiver/filelogreceiver/README.md)

Check warning on line 15 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (filelog)
is capable of parsing container logs from Kubernetes Pods, but it requires
[extensive configuration](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/aaa70bde1bf8bf15fc411282468ac6d2d07f772d/charts/opentelemetry-collector/templates/_config.tpl#L206-L282)
to properly parse logs according to various container runtime formats. This configuration complexity can be
mitigated by using the corresponding [helm chart preset](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-container-logs).
However, despite having the preset, it can still be challenging for users to maintain and
troubleshoot such advanced configurations.


The community has raised the issue of [improving the Kubernetes Logs Collection Experience](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/25251)
in the past. One step towards achieving this would be to provide a simplified and robust option
for parsing container logs without the need for manual specification or maintenance of the implementation details.
With the proposal and implementation of the new [container parser](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959),
all these implementation details are encapsulated and handled within the parser's implementation.
Adding to this the ability to cover the implementation with unit tests and various fail-over logic indicates
a significant improvement in container log parsing.


## How container logs look like

First of all we need to quickly recall the different container log formats that we can meet out there:

1) docker container logs
`{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}`

2) cri-o logs
`2024-04-13T07:59:37.505201169-05:00 stdout F This is a cri-o log line!`

3) containerd logs

Check warning on line 43 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (containerd)
`2024-04-22T10:27:25.813799277Z stdout F This is an awesome containerd log line!`

Check warning on line 44 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (containerd)

We can notice that cri-o and containerd log formats are quite similar (both follow the CRI logging format) but

Check warning on line 46 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (containerd)
with a small difference in the timestamp format.

Consequently, in order to properly handle these 3 different formats we need 3 different routes of stanza operators
as we can see in the [container parser operator issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959).

In addition, the CRI format can provide partial logs which we would like to combine them into one at first place:

```
2024-04-06T00:17:10.113242941Z stdout P This is a very very long line th
2024-04-06T00:17:10.113242941Z stdout P at is really really long and spa
2024-04-06T00:17:10.113242941Z stdout F ns across multiple log entries
```

Ideally, we want our parser to be capable of automatically detecting the format at runtime and properly parse
the log lines. We will see later that the container parser will do that for us.


## Attribute handling

Last but not least, container log files follow a specific naming pattern from which we can extract useful
metadata information during parsing. For example, from `/var/log/pods/kube-system_kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler/1.log`,

Check warning on line 67 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (kube)

Check warning on line 67 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (kube)
we can extract the namespace, the name and UID of the pod, and the name of the container.

After extracting this metadata, we need to store it properly using the appropriate attributes following
the [Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/resource/k8s/). This handling can also be
encapsulated within the parser's implementation, eliminating the need for users to define it manually.


## Using the new container parser

With all these in mind, the container parser can be configured like this:

```yaml
receivers:
filelog:
include_file_path: true
include:
- /var/log/pods/*/*/*.log
operators:
- id: container-parser
type: container
```
That configuration is more than enough to properly parse the log line and extract all the useful k8s metadata.

Check failure on line 90 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / TEXT linter

textlint terminology error

Incorrect usage of the term: “k8s”, use “K8s” instead
A log line `{"log":"INFO: This is a docker log line","stream":"stdout","time":"2024-03-30T08:31:20.545192187Z"}` that
is written at `/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log`
will produce a log entry like the following:

```json
{
"timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC",
"body": "INFO: This is a docker log line",
"attributes": {
"time": "2024-03-30T08:31:20.545192187Z",
"log.iostream": "stdout",
"k8s.pod.name": "kube-controller-kind-control-plane",
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
"k8s.container.name": "kube-controller",
"k8s.container.restart_count": "1",
"k8s.namespace.name": "kube-system",
"log.file.path": "/var/log/pods/kube-system_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
}
```

We can notice that we don't have to define the format. The parser automatically detects the format and
parses the logs accordingly. Even partial logs that cri-o or containerd runtimes can produce will be
recombined properly without the need of any special configuration.

This is really handy, because as users we don't need to care about specifying the format and even maintaining
different configurations for different environments.


## Implementation details

In order to implement that parser operator most of the code was written from scratch, but we were able to re-use
the recombine operator internally for the partial logs parsing. In order to achieve this some small refactoring
was required but this gave us the opportunity to re-use an already existent and well tested component.

Additionally, during the discussions around the implementation of this feature, a question popped up:
*Why to implement this as an operator and not as a processor?*

One basic reason is that the order of the log records arriving at processors is not guaranteed.
However we need to ensure this, so as to properly handle the partial log parsing. That's why implementing it
as an operator for now was the way to go.
Moreover, at the moment [it is suggested](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32080#issuecomment-2035301178)
to do as much work during the collection as possible and having robust parsing capabilities allows that.

More information about the implementation discussions can be found at the respective
[Github issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959) and

Check failure on line 137 in content/en/blog/2024/otel-collector-container-log-parser/index.md

View workflow job for this annotation

GitHub Actions / TEXT linter

textlint terminology error

Incorrect usage of the term: “Github”, use “GitHub” instead
its related/linked PR.


## Conclusion: container logs parsing is now easier with filelog receiver

Eager to learn more about the container parser? Visit the official [documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md)
and if you give it a try let us know what you think.
Don't hesitate to reach out to us in the official CNCF [Slack workspace](https://slack.cncf.io/) and
specifically the `#otel-collector` channel.


ps: Kudos to [Daniel Jaglowski](https://github.com/djaglowski) for reviewing the parser's implementation and
providing valuable feedback!

0 comments on commit d08fbe5

Please sign in to comment.