-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228
Comments
For weights, have you considered this format:
|
@austintlee I think with such format you need a way to map between exact sub-query and key name. For example, my query may look something like this:
we need to map each of 4 sub-queries to its weight. For instance it can be a query type, but I see few problems with such approach: which key to take for nested queries like |
I didn't realize this feature aspires to implement a generic hybrid search. I was under the impression that it simply combines a BM25 search and a KNN search which is why I thought you'd always have two weights that add up to 1.0. Don't the weights need to sum to 1? It looks like in the current implementation, you assign a weight of 1.0 to sub-queries that are not matched to the weights specified in the query. In other words, if you have 2 weights in the input and 4 sub-queries, the 3rd and 4th sub-queries seem to get a weight of 1.0? |
Yes the weights need to sum up to 1. We didn't add this check at start. This needs to be added. @austintlee This query clause that we are building is not specific to k-NN or bm-25. The new query clause is intended to be used for any Also, if you look closely you will see that k-NN query can be created from different query clauses like neural or any other clause in future. So, atleast code doesn't have a way to understand what is k-NN and what is BM-25. So this helps solve that problem also. :) |
Description
For Normalization and Score Combination feature, we need actual processing unit that will process scores collected on Query phase of Hybrid search. We need approach to define different techniques for score normalization and combination.
Solution
Solution we are proposing is to create new implementation of a Search phase result processor. This Processor will be setup as part of search pipeline to be called between Query and Fetch phases. More details on such processors can be found in corresponding core PR
Processor will support predefined set of techniques for normalization and combination. Exact techniques are defined using search pipeline API and then it must be referenced from
_search
call. We start frommin-max
for normalization andarithmetic mean
for combination.Processor definition may look something like this:
Tasks
Reference Links
The text was updated successfully, but these errors were encountered: