Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug information for knn queries #2289

Open
shatejas opened this issue Nov 26, 2024 · 3 comments
Open

Add debug information for knn queries #2289

shatejas opened this issue Nov 26, 2024 · 3 comments

Comments

@shatejas
Copy link
Collaborator

shatejas commented Nov 26, 2024

Is your feature request related to a problem?

With complex query execution paths for KNN query, its difficult to figure out the query execution flow. This makes it difficult to debug the query. There are limited debug logs in KNNQuery, I would like to have more useful debug logs to understand the query execution to be able to debug the query.

I want to meaningful logs which help me understand:

  1. What crucial query components were executed - Filtering, exact search, ANN search, rescoring, inner hits
  2. Useful metrics information e.g filtering result size, fine grained profiling information.

This should be added without impacting the latencies

What solution would you like?

There are couple of existing solutions which can be leveraged to have these:

  1. Opensearch logs: Adding debug logs will help users leverage slow logs and existing log viewing mechanisms in opensearch
  2. Debug information in profile breakdown: There is a hook in profile breakdown which can be leveraged to have debug information. This needs changes in opensearch core as there is no way to leverage this apart from aggregation query. This can be useful since the information is present in profile breakdown and doesn't need additional greps on logs

Both solutions can be leveraged. With both solutions, benchmarking should be done to make sure latencies aren't impacted

@navneet1v
Copy link
Collaborator

@shatejas won't the explain API be more useful here to know why a particular result came in the response?

also this is GH is different from: #2286 ?

@shatejas
Copy link
Collaborator Author

@shatejas won't the explain API be more useful here to know why a particular result came in the response?

@navneet1v I looked at the https://opensearch.org/docs/latest/api-reference/explain/. From what I understood, it explains how the score was calculated. That can be extremely useful, but the ask is more on giving users code execution flow through logs [mostly debug] and have timing/profiling information which won't be there in query breakdown. Having a lot of information in breakdown can be overwhelming to an extent where it can create ambiguity if internal working is not known

Let me know if I am missing something related to explain

also this is GH is different from: #2286 ?

So if we end up doing 2. then this can be an extension of #2286. Else it is not and can be treated independently.

@shatejas shatejas added good first issue Good for newcomers and removed good first issue Good for newcomers labels Nov 27, 2024
@navneet1v
Copy link
Collaborator

So if we end up doing 2. then this can be an extension of #2286. Else it is not and can be treated independently.

I think in that case I would recommend having 1 github issue and rather than 2. I feel what we want to do is have a mechanism to profile the vector query at the end of day, which is exactly what the first GH issue: #1985 was talking about. I think we should collate all these ideas at 1 place so that when we implement the feature we have all discussions at 1 place. I leave it upto you how you want to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants