Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Different score translations for ann search and exact search #2319

Open
wdongyu opened this issue Dec 11, 2024 · 2 comments
Open

[BUG] Different score translations for ann search and exact search #2319

wdongyu opened this issue Dec 11, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@wdongyu
Copy link

wdongyu commented Dec 11, 2024

What is the bug?
After introducing the settings index.knn.advanced.approximate_threshold in #2188, we may encounter such a scenario that two segments exist in one single shard. Suppose Segment_1 includes a graph and segment_2 does not, and they both include the same vector X.

When we conduct a search with a vector Q, we will get two different scores for the same vector X, because score translations for ann search and exact search are sightly different. For example, for cosine metric we have a score( 1 / (2 - cos(Q, X))) in ann search, but get another score ((1 + cos(Q, X)) / 2) in exact search.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a index with nmslib and cosine distance, set index.knn.advanced.approximate_threshold to -1, so that it never build a graph:
PUT test_nmslib_cosine
{
  "settings": {
    "index.knn": true,
    "index.knn.advanced.approximate_threshold": "-1",
    "index.number_of_shards": 1,
    "index.number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "target_field": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
          "engine": "nmslib",
          "space_type": "cosinesimil",
          "name": "hnsw"
        }
      }
    }
  }
}
  1. Ingest a doc with vector[0.6, 0.8]:
POST test_nmslib_cosine/_doc/1?refresh
{
  "target_field": [0.6, 0.8] 
}
  1. update index.knn.advanced.approximate_threshold to 1, so that it always build a graph:
PUT test_nmslib_cosine/_settings
{
  "index.knn.advanced.approximate_threshold": "1"
}
  1. Ingest another doc with vector[0.6, 0.8]:
POST test_nmslib_cosine/_doc/2?refresh
{
  "target_field": [0.6, 0.8] 
}
  1. Search the data:
POST test_nmslib_cosine/_search
{
  "query": {
    "knn": {
      "target_field": {
        "vector": [1, 2],
        "k": 10
      }
    }
  }
}
  1. Get the result:
"hits": [
      {
        "_index": "test_nmslib_cosine",
        "_id": "1",
        "_score": 0.99193496,    // -> score = (1 + cos([1, 2], [0.6, 0.8])) / 2
        "_source": {
          "target_field": [
            0.6,
            0.8
          ]
        }
      },
      {
        "_index": "test_nmslib_cosine",
        "_id": "2",
        "_score": 0.98412585,  // -> score = 1 / (2 - cos([1, 2], [0.6, 0.8]))
        "_source": {
          "target_field": [
            0.6,
            0.8
          ]
        }
      }
    ]

What is the expected behavior?
Should get a consistent score for the same query and data vector.

What is your host/environment?

  • OS: Any OS
  • Version 2.18.0

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@wdongyu wdongyu added bug Something isn't working untriaged labels Dec 11, 2024
@wdongyu wdongyu changed the title [BUG] Difference score translations for ann search and exact search [BUG] Different score translations for ann search and exact search Dec 11, 2024
@vamshin
Copy link
Member

vamshin commented Dec 11, 2024

@wdongyu good catch! if we make the score calculation consistent, that should fix the problem.

@navneet1v
Copy link
Collaborator

@wdongyu thanks for reporting this issue. I think this comes because for exact search we use Lucene based score translation and for native libs it is different. @VijayanB please take a look at this and lets ensure that we use a consistent score calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

4 participants