[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228

martin-gaievski · 2023-07-19T19:25:59Z

Description

For Normalization and Score Combination feature, we need actual processing unit that will process scores collected on Query phase of Hybrid search. We need approach to define different techniques for score normalization and combination.

Solution

Solution we are proposing is to create new implementation of a Search phase result processor. This Processor will be setup as part of search pipeline to be called between Query and Fetch phases. More details on such processors can be found in corresponding core PR

Processor will support predefined set of techniques for normalization and combination. Exact techniques are defined using search pipeline API and then it must be referenced from _search call. We start from min-max for normalization and arithmetic mean for combination.

Processor definition may look something like this:

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "MIN_MAX"
                },
                "combination": {
                    "technique": "ARITHMETIC_MEAN",
                    "parameters": {
                        "weights": [
                            0.4, 0.7
                        ]
                    }
                }
            }
        }
    ]
}

Tasks

Implementation of a Search phase result processor
Testing

Reference Links

The text was updated successfully, but these errors were encountered:

austintlee · 2023-08-16T15:18:12Z

For weights, have you considered this format:

"weights": {
    "knn": 0.4,
    "bm25": 0.6
}

martin-gaievski · 2023-08-16T15:49:10Z

For weights, have you considered this format:
"weights": {
    "knn": 0.4,
    "bm25": 0.6
}

@austintlee I think with such format you need a way to map between exact sub-query and key name. For example, my query may look something like this:

    "query": {
        "hybrid": {
            "queries": [
                {
                    "neural": {}
                },
                {
                    "match": {}
                },
                {
                    "match": {}
                }
                {
                    "bool": {
                        "should": [
                            {
                                "nested": {
                                    "path": "quest",
                                    "query": {
                                        "knn": {}
                                    }
                                }
                            }
                        ]
                    }
                }
            ] }}

we need to map each of 4 sub-queries to its weight. For instance it can be a query type, but I see few problems with such approach: which key to take for nested queries like bool [match], what if we need different weights for different sub-queries of same type.
Do you have something in mind for the mapping?

austintlee · 2023-08-17T22:51:20Z

I didn't realize this feature aspires to implement a generic hybrid search. I was under the impression that it simply combines a BM25 search and a KNN search which is why I thought you'd always have two weights that add up to 1.0.

Don't the weights need to sum to 1? It looks like in the current implementation, you assign a weight of 1.0 to sub-queries that are not matched to the weights specified in the query. In other words, if you have 2 weights in the input and 4 sub-queries, the 3rd and 4th sub-queries seem to get a weight of 1.0?

navneet1v · 2023-08-28T17:46:15Z

Don't the weights need to sum to 1?

Yes the weights need to sum up to 1. We didn't add this check at start. This needs to be added.

@austintlee This query clause that we are building is not specific to k-NN or bm-25. The new query clause is intended to be used for any n number of queries(where n <= 5) which are providing scores at different scale.

Also, if you look closely you will see that k-NN query can be created from different query clauses like neural or any other clause in future. So, atleast code doesn't have a way to understand what is k-NN and what is BM-25. So this helps solve that problem also. :)

martin-gaievski added Enhancements Increases software capabilities beyond original client specifications Features Introduces a new unit of functionality that satisfies a requirement labels Jul 19, 2023

github-actions bot added the untriaged label Jul 19, 2023

martin-gaievski mentioned this issue Jul 19, 2023

Adding search processor for score normalization and combination #227

Merged

5 tasks

martin-gaievski removed the untriaged label Jul 19, 2023

This was referenced Jul 27, 2023

Adding "weights" param for combination technique #234

Closed

Adding weights param for combination technique #235

Merged

Adding L2 norm technique #236

Merged

Add harmonic mean combination #238

Merged

Add geometric mean normalization for scores #239

Merged

navneet1v mentioned this issue Aug 7, 2023

[META] Score Combination and Normalization for Semantics Search. Score Normalization for k-NN and BM25 #123

Closed

11 tasks

martin-gaievski mentioned this issue Aug 9, 2023

Changed feature flag name for hybrid search #247

Merged

2 tasks

martin-gaievski closed this as completed Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228

[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228

martin-gaievski commented Jul 19, 2023 •

edited

Loading

austintlee commented Aug 16, 2023

martin-gaievski commented Aug 16, 2023

austintlee commented Aug 17, 2023

navneet1v commented Aug 28, 2023

[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228

[FEATURE] Provide way of defining methods for score normalization and combination in scope of Hybrid search #228

Comments

martin-gaievski commented Jul 19, 2023 • edited Loading

Description

Solution

Tasks

Reference Links

austintlee commented Aug 16, 2023

martin-gaievski commented Aug 16, 2023

austintlee commented Aug 17, 2023

navneet1v commented Aug 28, 2023

martin-gaievski commented Jul 19, 2023 •

edited

Loading