Discussion on combination of result sets from different query types #4557

jmazanec15 · 2022-09-19T22:06:43Z

Recently, from k-NN, we have received interest in combining results from text matching queries with k-NN queries (ref). So, I wanted to start a discussion about result combination and the problems doing it when scores are computed at the shard level.

From my understanding, the only way to currently combine scores from different queries is through a boolean query. In a boolean query, multiple query clauses are provided in the should or must section and the scores are combined via addition. On top of this, with function scoring, it is possible to manipulate the score of the results further before the addition of scores.

That being said, I see a few limitations with this approach.

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

Second, it is not possible to consider global hits for re-ranking. Because scores are assigned at the shard level, any rescoring has to be done at the shard level. I see a problem with this in two cases: first, for score combination, if an index has a significant number of shards, a user may want only the top results to be combined instead of combining for the results of all shards in the index and then aggregating at the coordinator; second, in the future, a user may have a model that they want to run the results through to re-rank them that is expensive and they dont want to run for each shard in the index.

That being said, my preliminary thoughts to creating a solution for this would be to create some kind of search and merge api that might look like:

GET /{index_name}/_search_and_merge
{
    "queries": [
      {
        "query": {
          "knn": {
            "my_vector2": {
              "vector": [2, 3, 5, 6],
              "k": 2
            }
          }
        }
      },
      {
        "query": {
          "match_all": {}
        }
      }
    ],
    "merge": {
      "norm": true,
      "disjunctive": false,
      "strategy": "linear_combo"
      "params": {
        "weight_1": 10.2,
        "weight_2": 0.25
      }
    },
    "size": 100
}

Here the queries would be a list of queries to be executed independently and merge would contain the logic to combine the results at the coordinator level.

All that being said, I want to see what people think of doing this? Is there a way to accomplish this with the current search interface?

The text was updated successfully, but these errors were encountered:

dblock · 2022-09-20T20:38:55Z

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)?
Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

jmazanec15 · 2022-09-21T00:12:26Z

In general, I think it would be similar to rescore on coordinator level. The workflow I have in mind would:

Execute multiple queries that would return results to coordinator (almost like _msearch API)
Pre-process results from each query (i.e. normalization)
Expose rescore functionality based on these results (take into account score combination)
Reduce results based on score and execute fetch phase

I think extending rescore might be confusing from user perspective (i.e. when does it happen shard or coordinator).

navneet1v · 2022-09-27T07:07:30Z

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

Yes this can be seen as a possible solution and I think we should do some brainstorming around how we want to do this.

But as from requirements standpoint of the above ask, what Jack mentioned in the comment is the exact requirements. Do we want to call is rescore or some other name that we can debate.

peterdm · 2022-12-20T12:59:53Z

Take a look at this paper from the Pinecone team: https://arxiv.org/abs/2210.11934

Convex Combination shows promise as a simple/effective way to combine the two hybrid scores. (Normalize each, and regress to find a linear scaling factor between the two distributions).

Distributed normalization of course is the challenge. Shard-level might be workable.

I’d suggest starting with simply allowing the scores from each query type to be returned and combined as the developer sees fit.

For example convex combination could be implemented by two script_score queries inside a dismax tie=1 (where each script score employed a precomputed normalization function and boost for each query type).

aishwaryabajaj-54 · 2022-12-29T04:42:06Z

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

navneet1v · 2023-01-06T20:41:43Z

We have not decided yet as of now. I am working on building some proposal around the same. Will update this issue once I have some research done. But on high level I was thinking to run the queries in parallel. So that we can reduce the latency and then do the normalizations of the scores before we combine them.

But this is still very early thought. Will need to see what is best.

navneet1v · 2023-01-09T07:41:07Z

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

@aishwaryabajaj-54 do we see a difference if the queries are performed in different order? As per my understanding if we are doing normalization of scores for each query and then combining the scores then it doesn't matter in which order the queries are executed.

Please let us know if you think otherwise and is there can be a case where this assumption doesn't hold true.

navneet1v · 2023-01-09T08:30:44Z

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

@dblock
The idea of extending the rescoring at higher level that shard looks very promising. But on a different level I was thinking to build another phase in the whole Query and Fetch phases flow. This can be used by different plugins to do operations at Manager Level before the control goes to Fetch phase. I will try to create a proposal for this to start some discussion.

dblock · 2023-01-10T16:41:20Z

Thanks @navneet1v. Since we're talking additional phases in query and fetch, cc-ing @nknize who has been thinking about refactoring aggregation, which I believe is either on the same path or at least related.

rhvaz · 2023-01-11T17:24:48Z

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation opensearch-project/k-NN#717

I realised that the query below simply retrieves based on keyword matching and then re-ranks based on cosinesimil, which leaves out many relevant candidates in vector space.

body = {
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "query_string": {
           "fields": ["category"],
           "query": "hairdresser"
       }
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "text_embeddings",
         "query_value": [0.1, 1.0, 0.3, 0.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

rhvaz · 2023-01-11T17:45:36Z

I've tried to adapt the recommended answer on a test index from this https://stackoverflow.com/questions/67701323/full-text-and-knn-vector-hybrid-search-for-elastic and it seems to work for me. Any thoughts on this solution? Perhaps latency concerns?

body = {
  "size": 5,
  "query": {
    "function_score": {
      "query": {
        "bool": { 
          "should" : [
            {
              "multi_match" : { 
                "query": "electrician",
                "fields": ["category"]
              }
            },
            {
              "match_all": { }
            }
          ],
          "minimum_should_match" : 0
        }
      },
      "functions": [
      {
        "script_score" : {
          "script" : {
            "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_value, doc['my_vector']) + _score",
            "params": {
              "query_value": [1.0, 1.0, 0.0, 0.0]
            }
          }
        },
        "weight": 1
      }
      ],
      "score_mode": "sum",
      "boost_mode": "sum"
    }
  }
}

navneet1v · 2023-01-11T18:32:50Z

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation opensearch-project/k-NN#717

I have responded on the feature request. Lets use that issue to talk more about the feature request.

Coming back the problem statement for this particular github issue, all the current score combinations work at Shard Level. We don't want to combine the scores at shard level, but at the Coordinator node level. Also the scores which are generated by keyword match and k-NN are not on the same scale. I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1. Quoting from description

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

rhvaz · 2023-01-12T09:44:35Z

I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1.

Yeah that makes sense. One way could be to use a sigmoid function on the BM-25 score to map it between 0 and 1. The parameters of the function could be hyperparameters that the user selects. Example,

1 / (1 + Math.exp( - a * (BM25_score - b)))

This a and b can be used to control the slope and point of transition of the curve.

ankitas3 · 2023-01-16T07:21:07Z

We want to use something similar to get results from vector field along with exact match. Currently the scores received from both queries are not on the same scale as mentioned in the discussion above. Following is the query we expect to work to achieve so.

body = {
 "size": 10,
 "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
              "query": "smart tv",
              "fields": ["summary"],
              "_name": "multi_match",
              "type": "phrase"
          }
        },
        {
          "knn": {
            "text_vector": {
              "vector": [2, 3, 5, 6],
              "k": 10,
                "_name": "knn"
            }
          }
        }
      ]
    }
  }
}

Will the queries above run in parallel or series? Search speed of BM25 is relatively slow.

navneet1v · 2023-01-17T19:22:15Z

Hi @ankitas3,
The queries present in the should clause doesn't run in parallel.
Consider there is a shard on the data node which has lets say 3 segments. So, the sub-queries(k-nn and multi-match) will be keep on executing on the segments one by one in sequential fashion and at the Bool query level for 1 shards the scores will be combined.

The speed of BM-25 depends on many factors. I would recommend running the multi-match query separately to find is BM-25 really running slow, if yes please try to optimize the BM-25 query.

aishwaryabajaj-54 · 2023-01-18T06:04:54Z

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

ankitas3 · 2023-01-18T06:18:41Z

@navneet1v I tried running BM25 separately, which is slow. Can you share any references to improve its speed.

navneet1v · 2023-01-18T06:27:13Z

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

@ankitas3, @aishwaryabajaj-54

Right now there is no way to execute them in parallel in a single query. You can see _msearch api but I am not sure if you are looking for that. It works something like _bulk api.

For improving the speed of the query I would recommend cutting a separate Github issue to talk, as this Github issue is not related to improving the BM-25 speed. In the new issue please make sure to provide details like OS version, Operating System, indices shards and number of nodes etc. On that GH issue we can talk around what things you can do to reduce the latency.

@ankitas3
Some places I would start looking is:

Count of segments for an index. More segments for a replica for an index, it means more latency.
See if you cluster is under-scaled. Check the CPU utilization for the data nodes.
Check the service latency, the latency coming in the api response and not the latency seen by the client. This can help pinpoint from where latency is happening.

aishwaryabajaj-54 · 2023-01-18T06:55:17Z

@navneet1v, I have tried _msearch and it has its own cons. Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

navneet1v · 2023-01-18T07:28:54Z

Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

@aishwaryabajaj-54 can you please elaborate? what is your ask here? and it is related to what?

navneet1v · 2023-02-24T21:36:05Z

I am closing this issue, as the problem defined above is getting taken care via : opensearch-project/neural-search#123. Please do a +1 on the issue to support the work. I will keep that other issue updated and will use that to track the progress.

jmazanec15 added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 19, 2022

tlfeng added discuss Issues intended to help drive brainstorming and decision making Indexing & Search and removed untriaged labels Sep 20, 2022

jmazanec15 mentioned this issue Oct 5, 2022

[RFC] OpenSearch neural-search plugin opensearch-project/neural-search#11

Closed

navneet1v mentioned this issue Jan 19, 2023

[FEATURE] Hybrid search using keyword matching and kNN opensearch-project/k-NN#717

Closed

navneet1v closed this as completed Feb 24, 2023

navneet1v mentioned this issue Feb 24, 2023

[META] Score Combination and Normalization for Semantics Search. Score Normalization for k-NN and BM25 opensearch-project/neural-search#123

Closed

11 tasks

navneet1v mentioned this issue Feb 28, 2023

[RFC] High Level Approach and Design For Normalization and Score Combination opensearch-project/neural-search#126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on combination of result sets from different query types #4557

Discussion on combination of result sets from different query types #4557

jmazanec15 commented Sep 19, 2022

dblock commented Sep 20, 2022 •

edited

Loading

jmazanec15 commented Sep 21, 2022

navneet1v commented Sep 27, 2022 •

edited

Loading

peterdm commented Dec 20, 2022

aishwaryabajaj-54 commented Dec 29, 2022

navneet1v commented Jan 6, 2023

navneet1v commented Jan 9, 2023

navneet1v commented Jan 9, 2023 •

edited

Loading

dblock commented Jan 10, 2023

rhvaz commented Jan 11, 2023

rhvaz commented Jan 11, 2023

navneet1v commented Jan 11, 2023

rhvaz commented Jan 12, 2023

ankitas3 commented Jan 16, 2023 •

edited

Loading

navneet1v commented Jan 17, 2023 •

edited

Loading

aishwaryabajaj-54 commented Jan 18, 2023

ankitas3 commented Jan 18, 2023

navneet1v commented Jan 18, 2023 •

edited

Loading

aishwaryabajaj-54 commented Jan 18, 2023

navneet1v commented Jan 18, 2023

navneet1v commented Feb 24, 2023

Discussion on combination of result sets from different query types #4557

Discussion on combination of result sets from different query types #4557

Comments

jmazanec15 commented Sep 19, 2022

dblock commented Sep 20, 2022 • edited Loading

jmazanec15 commented Sep 21, 2022

navneet1v commented Sep 27, 2022 • edited Loading

peterdm commented Dec 20, 2022

aishwaryabajaj-54 commented Dec 29, 2022

navneet1v commented Jan 6, 2023

navneet1v commented Jan 9, 2023

navneet1v commented Jan 9, 2023 • edited Loading

dblock commented Jan 10, 2023

rhvaz commented Jan 11, 2023

rhvaz commented Jan 11, 2023

navneet1v commented Jan 11, 2023

rhvaz commented Jan 12, 2023

ankitas3 commented Jan 16, 2023 • edited Loading

navneet1v commented Jan 17, 2023 • edited Loading

aishwaryabajaj-54 commented Jan 18, 2023

ankitas3 commented Jan 18, 2023

navneet1v commented Jan 18, 2023 • edited Loading

aishwaryabajaj-54 commented Jan 18, 2023

navneet1v commented Jan 18, 2023

navneet1v commented Feb 24, 2023

dblock commented Sep 20, 2022 •

edited

Loading

navneet1v commented Sep 27, 2022 •

edited

Loading

navneet1v commented Jan 9, 2023 •

edited

Loading

ankitas3 commented Jan 16, 2023 •

edited

Loading

navneet1v commented Jan 17, 2023 •

edited

Loading

navneet1v commented Jan 18, 2023 •

edited

Loading