Add debug information for knn queries #2289

shatejas · 2024-11-26T12:27:10Z

Is your feature request related to a problem?

With complex query execution paths for KNN query, its difficult to figure out the query execution flow. This makes it difficult to debug the query. There are limited debug logs in KNNQuery, I would like to have more useful debug logs to understand the query execution to be able to debug the query.

I want to meaningful logs which help me understand:

What crucial query components were executed - Filtering, exact search, ANN search, rescoring, inner hits
Useful metrics information e.g filtering result size, fine grained profiling information.

This should be added without impacting the latencies

What solution would you like?

There are couple of existing solutions which can be leveraged to have these:

Opensearch logs: Adding debug logs will help users leverage slow logs and existing log viewing mechanisms in opensearch
Debug information in profile breakdown: There is a hook in profile breakdown which can be leveraged to have debug information. This needs changes in opensearch core as there is no way to leverage this apart from aggregation query. This can be useful since the information is present in profile breakdown and doesn't need additional greps on logs

Both solutions can be leveraged. With both solutions, benchmarking should be done to make sure latencies aren't impacted

navneet1v · 2024-11-27T07:29:30Z

@shatejas won't the explain API be more useful here to know why a particular result came in the response?

also this is GH is different from: #2286 ?

shatejas · 2024-11-27T08:20:33Z

@shatejas won't the explain API be more useful here to know why a particular result came in the response?

@navneet1v I looked at the https://opensearch.org/docs/latest/api-reference/explain/. From what I understood, it explains how the score was calculated. That can be extremely useful, but the ask is more on giving users code execution flow through logs [mostly debug] and have timing/profiling information which won't be there in query breakdown. Having a lot of information in breakdown can be overwhelming to an extent where it can create ambiguity if internal working is not known

Let me know if I am missing something related to explain

also this is GH is different from: #2286 ?

So if we end up doing 2. then this can be an extension of #2286. Else it is not and can be treated independently.

navneet1v · 2024-11-27T08:40:48Z

So if we end up doing 2. then this can be an extension of #2286. Else it is not and can be treated independently.

I think in that case I would recommend having 1 github issue and rather than 2. I feel what we want to do is have a mechanism to profile the vector query at the end of day, which is exactly what the first GH issue: #1985 was talking about. I think we should collate all these ideas at 1 place so that when we implement the feature we have all discussions at 1 place. I leave it upto you how you want to do this.

shatejas added untriaged enhancement labels Nov 26, 2024

jmazanec15 removed the untriaged label Nov 27, 2024

shatejas added good first issue Good for newcomers and removed good first issue Good for newcomers labels Nov 27, 2024

navneet1v mentioned this issue Nov 27, 2024

[META] [FEATURE] Add Vector Search Query level metrics to understand latency for different steps in Vector Search Query #1985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug information for knn queries #2289

Add debug information for knn queries #2289

shatejas commented Nov 26, 2024 •

edited

Loading

navneet1v commented Nov 27, 2024

shatejas commented Nov 27, 2024

navneet1v commented Nov 27, 2024

Add debug information for knn queries #2289

Add debug information for knn queries #2289

Comments

shatejas commented Nov 26, 2024 • edited Loading

navneet1v commented Nov 27, 2024

shatejas commented Nov 27, 2024

navneet1v commented Nov 27, 2024

shatejas commented Nov 26, 2024 •

edited

Loading