Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping Top N queries documentation #8173

Merged
Merged
331 changes: 331 additions & 0 deletions _observing-your-data/query-insights/grouping-top-n-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
---
layout: default
title: Grouping top N queries
parent: Query insights
nav_order: 20
---

# Grouping top N queries
**Introduced 2.17**
{: .label .label-purple }

Monitoring the [top N queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/) helps identify the most resource-intensive queries based on latency, CPU, and memory usage over a specified time window. However, if a single computationally-expensive query is executed multiple times, it can occupy all top N query slots, potentially preventing other expensive queries from appearing in the list. To address this issue, you can group similar queries, gaining insight into various high-impact query groups.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line above: FYI, no hyphen in a compound adjective if the first word is an adverb ending in -ly

Starting with OpenSearch version 2.17, top N queries can be grouped by `similarity`, with additional grouping options planned for future releases.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Grouping queries by similarity

Grouping queries by `similarity` organizes them based on the query structure, stripping out everything except the core query operations.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

For example, the following query:

```json
{
"query": {
"bool": {
"must": [
{ "exists": { "field": "field1" } }
],
"query_string": {
"query": "search query"
}
}
}
}
```

Has the following corresponding query structure:

```c
bool
must
exists
query_string
```

When queries share the same query structure, they are grouped together, ensuring that all similar queries belong to the same group.


## Aggregate metrics per group

In addition to retrieving latency, CPU, and memory metrics for individual top N queries, you can obtain aggregate statistics for the
top N query groups. For each query group, the response includes the following statistics:
- The total latency, CPU usage, or memory usage (depending on the configured metric type)
- The total query count

Using these statistics, you can calculate the average latency, CPU usage, or memory usage for each query group.
The response also includes one example query from the query group.

## Configuring query grouping

Before you enable query grouping, you must enable top N query monitoring for a metric type of your choice. For more information, see [Configuring top N query monitoring]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/#configuring-top-n-query-monitoring).

To configure grouping for Top N queries, follow these steps:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

### Step 1: Enable top N query monitoring

Ensure that Top N is enabled for at least one of the metrics: latency, CPU, or memory. For more information, see [Configuring Top N query monitoring]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/#configuring-top-n-query-monitoring).
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

For example, to enable top N query monitoring by latency with default settings, send the following request:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
```json
PUT _cluster/settings
{
"persistent" : {
"search.insights.top_queries.latency.enabled" : true
}
}
```
{% include copy-curl.html %}

### Step 2: Configure query grouping

Set the desired grouping method by updating the following cluster setting:

```json
PUT _cluster/settings
{
"persistent" : {
"search.insights.top_queries.group_by" : "similarity"
}
}
```
{% include copy-curl.html %}

The default value for the `group_by` setting is `none`, which disables grouping. As of OpenSearch 2.17, the supported values for group_by are `similarity` and `none`.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

### Step 3 (Optional): Set a limit on the number of query groups to monitor
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Optionally, you can set a limit on the number of query groups to monitor. Queries that are already part of the top N query list (the most resource-intensive queries) will not count toward that limit. Essentially, the maximum applies only to other query groups, and the top N queries are tracked separately. This helps manage the tracking of query groups based on workload and query window size.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

To limit tracking to 100 query groups, send the following request:

```json
PUT _cluster/settings
{
"persistent" : {
"search.insights.top_queries.max_groups_excluding_topn" : 100
}
}
```
{% include copy-curl.html %}

The default value for `max_groups_excluding_topn` is `100`, and you can set it to any value between `0` and `10,000`, inclusive.

## Monitoring query groups

To view the top N query groups, send the following request:

```json
GET /_insights/top_queries
```
{% include copy-curl.html %}

The response contains the top N query groups:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"top_queries": [
{
"timestamp": 1725495127359,
"source": {
"query": {
"match_all": {
"boost": 1.0
}
}
},
"phase_latency_map": {
"expand": 0,
"query": 55,
"fetch": 3
},
"total_shards": 1,
"node_id": "ZbINz1KFS1OPeFmN-n5rdg",
"query_hashcode": "b4c4f69290df756021ca6276be5cbb75",
"task_resource_usages": [
{
"action": "indices:data/read/search[phase/query]",
"taskId": 30,
"parentTaskId": 29,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 33249000,
"memory_in_bytes": 2896848
}
},
{
"action": "indices:data/read/search",
"taskId": 29,
"parentTaskId": -1,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 3151000,
"memory_in_bytes": 133936
}
}
],
"indices": [
"my_index"
],
"labels": {},
"search_type": "query_then_fetch",
"measurements": {
"latency": {
"number": 160,
"count": 10,
"aggregationType": "AVERAGE"
}
}
},
{
"timestamp": 1725495135160,
"source": {
"query": {
"term": {
"content": {
"value": "first",
"boost": 1.0
}
}
}
},
"phase_latency_map": {
"expand": 0,
"query": 18,
"fetch": 0
},
"total_shards": 1,
"node_id": "ZbINz1KFS1OPeFmN-n5rdg",
"query_hashcode": "c3620cc3d4df30fb3f95aeb2167289a4",
"task_resource_usages": [
{
"action": "indices:data/read/search[phase/query]",
"taskId": 50,
"parentTaskId": 49,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 10188000,
"memory_in_bytes": 288136
}
},
{
"action": "indices:data/read/search",
"taskId": 49,
"parentTaskId": -1,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 262000,
"memory_in_bytes": 3216
}
}
],
"indices": [
"my_index"
],
"labels": {},
"search_type": "query_then_fetch",
"measurements": {
"latency": {
"number": 109,
"count": 7,
"aggregationType": "AVERAGE"
}
}
},
{
"timestamp": 1725495139766,
"source": {
"query": {
"match": {
"content": {
"query": "first",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}
},
"phase_latency_map": {
"expand": 0,
"query": 15,
"fetch": 0
},
"total_shards": 1,
"node_id": "ZbINz1KFS1OPeFmN-n5rdg",
"query_hashcode": "484eaabecd13db65216b9e2ff5eee999",
"task_resource_usages": [
{
"action": "indices:data/read/search[phase/query]",
"taskId": 64,
"parentTaskId": 63,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 12161000,
"memory_in_bytes": 473456
}
},
{
"action": "indices:data/read/search",
"taskId": 63,
"parentTaskId": -1,
"nodeId": "ZbINz1KFS1OPeFmN-n5rdg",
"taskResourceUsage": {
"cpu_time_in_nanos": 293000,
"memory_in_bytes": 3216
}
}
],
"indices": [
"my_index"
],
"labels": {},
"search_type": "query_then_fetch",
"measurements": {
"latency": {
"number": 43,
"count": 3,
"aggregationType": "AVERAGE"
}
}
}
]
}
```

</details>

## Response fields

The response includes the following fields.

Field | Data type | Description
:--- |:---| :---
`top_queries` | Array | The list of top query groups.
`top_queries.timestamp` | Integer | The execution timestamp for the first query in the query group.
`top_queries.source` | Object | The first query in the query group.
`top_queries.phase_latency_map` | Object | The phase latency map for the first query in the query group. The map includes the times in milliseconds the query spent in `expand`, `query`, and `fetch` phases.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.total_shards` | Integer | The number of shards on which the first query was executed.
`top_queries.node_id` | String | The node ID of the node that coordinated the execution of the first query in the query group.
`top_queries.query_hashcode` | String | The hashcode that uniquely identifies the query group. This is essentially the hash of the [query structure](#grouping-queries-by-similarity).

Check failure on line 322 in _observing-your-data/query-insights/grouping-top-n-queries.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: hashcode. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: hashcode. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_observing-your-data/query-insights/grouping-top-n-queries.md", "range": {"start": {"line": 322, "column": 20}}}, "severity": "ERROR"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.task_resource_usages` | Array of objects | The resource usage breakdown for the various tasks for the first query in the query group.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.indices` | Array | The indexes which the first query in the query group is searching.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.labels` | Object | Used to label the top query.
`top_queries.search_type` | String | The search request execution type (`query_then_fetch` or `dfs_query_then_fetch`). For more information, see the `search_type` parameter in the [Search API documentation]({{site.url}}{{site.baseurl}}/api-reference/search/#url-parameters).
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.measurements` | Object | The aggregate measurements for the query group.
`top_queries.measurements.latency` | Object | The aggregate latency measurements for the query group.
`top_queries.measurements.latency.number` | Integer | The total latency for the query group.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
`top_queries.measurements.latency.count` | Integer | The number of queries in the query group.
`top_queries.measurements.latency.aggregationType` | String | The aggregation type for the current entry. If grouping by similarity is enabled, then `aggregationType` is `AVERAGE`. If not enabled, then `aggregationType` is `NONE`.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 3 additions & 0 deletions _observing-your-data/query-insights/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ has_toc: false
---

# Query insights
**Introduced 2.12**
{: .label .label-purple }

To monitor and analyze the search queries within your OpenSearch clusterQuery information, you can obtain query insights. With minimal performance impact, query insights features aim to provide comprehensive insights into search query execution, enabling you to better understand search query characteristics, patterns, and system behavior during query execution stages. Query insights facilitate enhanced detection, diagnosis, and prevention of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience.

Expand Down Expand Up @@ -36,4 +38,5 @@ For information about installing plugins, see [Installing plugins]({{site.url}}{
You can obtain the following information using Query Insights:

- [Top n queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/top-n-queries/)
- [Grouping Top N queries]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/grouping-top-n-queries/)
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- [Query metrics]({{site.url}}{{site.baseurl}}/observing-your-data/query-insights/query-metrics/)
4 changes: 3 additions & 1 deletion _observing-your-data/query-insights/query-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
layout: default
title: Query metrics
parent: Query insights
nav_order: 20
nav_order: 30
---

# Query metrics
**Introduced 2.16**
{: .label .label-purple }

Key query [metrics](#metrics), such as aggregation types, query types, latency, and resource usage per query type, are captured along the search path by using the OpenTelemetry (OTel) instrumentation framework. The telemetry data can be consumed using OTel metrics [exporters]({{site.url}}{{site.baseurl}}/observing-your-data/trace/distributed-tracing/#exporters).

Expand Down
Loading