Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LOI-346] Milvus Logs #19331

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions milvus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,42 @@ For containerized environments, see the [Autodiscovery Integration Templates][3]
<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->

#### Logs

The Milvus integration can collect logs from the Milvus pods or containers.

<!-- xxx tabs xxx -->
<!-- xxx tab "Host" xxx -->

Apply this if you want to collect logs from Milvus standalone containers.

1. Collecting logs is disabled by default in the Datadog Agent. Enable it in your `datadog.yaml` file:

```yaml
logs_enabled: true
```

2. Uncomment and edit the logs configuration block in your `milvus.d/conf.yaml` file. Here's an example:

```yaml
logs:
- type: docker
source: milvus
service: milvus
```

<!-- xxz tab xxx -->
<!-- xxx tab "Kubernetes" xxx -->

Apply this if you want to collect logs from a Milvus Kubernetes cluster.

Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes Log Collection][10].

Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of [Kubernetes Log Collection][11].

<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->

### Validation

[Run the Agent's status subcommand][6] and look for `milvus` under the Checks section.
Expand Down Expand Up @@ -66,3 +102,5 @@ Need help? Contact [Datadog support][9].
[7]: https://github.com/DataDog/integrations-core/blob/master/milvus/metadata.csv
[8]: https://github.com/DataDog/integrations-core/blob/master/milvus/assets/service_checks.json
[9]: https://docs.datadoghq.com/help/
[10]: https://docs.datadoghq.com/agent/kubernetes/log/#setup
[11]: https://docs.datadoghq.com/agent/kubernetes/log/#configuration
52 changes: 52 additions & 0 deletions milvus/assets/logs/milvus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
id: milvus
metric_id: milvus
backend_only: false
facets: null
pipeline:
type: pipeline
name: Milvus
enabled: true
filter:
query: source:milvus
processors:
- type: grok-parser
name: Grok Parser
enabled: true
source: message
samples:
- '[2024/11/27 14:07:51.849 +00:00] [INFO] [datacoord/handler.go:341]
["channel seek position set from channel checkpoint meta"]
[channel=by-dev-rootcoord-dml_2_453764875273209568v0]
[posTs=454221223538458625] [posTime=2024/11/27 14:07:39.421 +00:00]'
- '[2024/11/27 14:07:01.849 +00:00] [INFO] [datacoord/services.go:833]
["datacoord append channelInfo in GetRecoveryInfo"]
[traceID=ed216b196edf0589f281c4ad800f6565]
[collectionID=453764875273209568] [partitionIDs="[]"]
[channel=by-dev-rootcoord-dml_2_453764875273209568v0] ["# of unflushed
segments"=0] ["# of flushed segments"=1] ["# of dropped segments"=0]
["# of indexed segments"=0] ["# of l0 segments"=0]'
- '[2024/11/27 14:06:51.852 +00:00] [INFO] [datacoord/services.go:818]
["get recovery info request received"]
[traceID=54cda8d3229d00982db785351a12ea7a]
[collectionID=453764875273212700] [partitionIDs="[]"]'
grok:
supportRules: message_rule
\[?(")%{data:message.body}?(")]\s+%{data:message.details:array("[]","]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you provide more context about what the following part of the expression is achieving (\[?(")%{data:message.body}?(")])? Unless I misread, I am confused by the fact that:
  • \[? means that the brackets are optional
  • why do we need parenthesis around " in (")
  • if the first bracket is optional, should the ending one be as well?

Or is the goal to make the quotes optional in the message.body block (in which case we might need to slightly adjust the expression)?

  1. I am also wondering if we could make this \[?(")%{data:message.body}?(")] more efficient by replacing the use of data by the following regex: %{regex("[^]]*"):message.body} (ie matching up to the delimiter: ]).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

  1. The ?(") is intended to make the quotes optional. I see that it is incorrect, I'll change that.
  2. With the fact that the quotes need to be optional, I'm leaning towards this instead:
    message_rule \["?%{data:message.body}"?](\s+%{data:message.details:array("[]","] [")})?
    Here is an example log that would be covered by this change:
    [2024/11/18 15:15:45.120 +00:00] [INFO] [roles/roles.go:282] [setupPrometheusHTTPServer]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I think your latest changes make sense.

Could you just replace the %{data:message.body} as suggested, as having multiple data in the grok expression tends to be quite costly to evaluate, with possible side-effects if it ends up taking too long (ie we might have to abort parsing the expression). Replacing it with %{regex("[^]]*"):message.body} should alleviate the issue a bit.

[")}
matchRules: rule1 \[%{date("yyyy/MM/dd HH:mm:ss.SSS
ZZ"):date}]\s+\[%{word:level}\]\s+\[%{word:service}/%{notSpace:file}:%{word:lineno}\]\s+%{message_rule}
- type: status-remapper
name: Define `level` as the official status of the log
enabled: true
sources:
- level
- type: message-remapper
name: Define `message.body` as the official message of the log
enabled: true
sources:
- message.body
- type: date-remapper
name: Define `date` as the official date of the log
enabled: true
sources:
- date
65 changes: 65 additions & 0 deletions milvus/assets/logs/milvus_tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
id: milvus
tests:
-
sample: "[2024/11/27 14:07:51.849 +00:00] [INFO] [datacoord/handler.go:341] [\"channel seek position set from channel checkpoint meta\"] [channel=by-dev-rootcoord-dml_2_453764875273209568v0] [posTs=454221223538458625] [posTime=2024/11/27 14:07:39.421 +00:00]"
result:
custom:
date: 1732716471849
file: "handler.go"
level: "INFO"
lineno: "341"
message:
details:
- "channel=by-dev-rootcoord-dml_2_453764875273209568v0"
- "posTs=454221223538458625"
- "posTime=2024/11/27 14:07:39.421 +00:00"
service: "datacoord"
message: "channel seek position set from channel checkpoint meta"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716471849
-
sample: "[2024/11/27 14:07:01.849 +00:00] [INFO] [datacoord/services.go:833] [\"datacoord append channelInfo in GetRecoveryInfo\"] [traceID=ed216b196edf0589f281c4ad800f6565] [collectionID=453764875273209568] [partitionIDs=\"[]\"] [channel=by-dev-rootcoord-dml_2_453764875273209568v0] [\"# of unflushed segments\"=0] [\"# of flushed segments\"=1] [\"# of dropped segments\"=0] [\"# of indexed segments\"=0] [\"# of l0 segments\"=0]"
result:
custom:
date: 1732716421849
file: "services.go"
level: "INFO"
lineno: "833"
message:
details:
- "traceID=ed216b196edf0589f281c4ad800f6565"
- "collectionID=453764875273209568"
- "partitionIDs=\"[]\""
- "channel=by-dev-rootcoord-dml_2_453764875273209568v0"
- "\"# of unflushed segments\"=0"
- "\"# of flushed segments\"=1"
- "\"# of dropped segments\"=0"
- "\"# of indexed segments\"=0"
- "\"# of l0 segments\"=0"
service: "datacoord"
message: "datacoord append channelInfo in GetRecoveryInfo"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716421849
-
sample: "[2024/11/27 14:06:51.852 +00:00] [INFO] [datacoord/services.go:818] [\"get recovery info request received\"] [traceID=54cda8d3229d00982db785351a12ea7a] [collectionID=453764875273212700] [partitionIDs=\"[]\"]"
result:
custom:
date: 1732716411852
file: "services.go"
level: "INFO"
lineno: "818"
message:
details:
- "traceID=54cda8d3229d00982db785351a12ea7a"
- "collectionID=453764875273212700"
- "partitionIDs=\"[]\""
service: "datacoord"
message: "get recovery info request received"
status: "info"
tags:
- "source:LOGS_SOURCE"
timestamp: 1732716411852
Loading