Skip to content

Commit

Permalink
Updating the K-NN Filters documentation due to recent enhancements in… (
Browse files Browse the repository at this point in the history
#4987)

* Updating the K-NN Filters documentation due to recent enhancements in Efficient Filters

Signed-off-by: Navneet Verma <[email protected]>

* Fixed the review comments.

Signed-off-by: Navneet Verma <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>

* Update settings.md

Signed-off-by: Navneet Verma <[email protected]>

* Update _search-plugins/knn/settings.md

Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Sep 22, 2023
1 parent 59ea279 commit 3a23cc6
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 12 deletions.
23 changes: 13 additions & 10 deletions _search-plugins/knn/filter-search-knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ To refine k-NN results, you can filter a k-NN search using one of the following

- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned (if there are at least `k` results in total). This approach is supported by the following engines:
- Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Faiss engine with an HNSW algorithm (k-NN plugin versions 2.9 or later)
- Faiss engine with an HNSW algorithm (k-NN plugin versions 2.9 and later) or IVF algorithm (k-NN plugin versions 2.10 and later)

- [Post-filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter. You can use the following two filtering strategies for this approach:
- [Boolean post-filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently, and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query.
Expand All @@ -25,7 +25,7 @@ The following table summarizes the preceding filtering use cases.

Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause
:--- | :--- | :--- | :---
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`) | Inside the k-NN query clause.
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause.
Boolean filter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause. Must be a leaf clause.
The `post_filter` parameter | After search (post-filtering) | Approximate | - `lucene`<br> - `nmslib`<br> - `faiss` | Outside the k-NN query clause.
Scoring script filter | Before search (pre-filtering) | Exact | N/A | Inside the script score query clause.
Expand All @@ -42,12 +42,12 @@ Once you've estimated the number of documents in your index, the restrictiveness

| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency |
| :-- | :-- | :-- | :-- | :-- |
| 10M | 2.5 | 100 | Scoring script | Scoring script |
| 10M | 38 | 100 | Efficient k-NN filtering | Boolean filter |
| 10M | 80 | 100 | Scoring script | Efficient k-NN filtering |
| 1M | 2.5 | 100 | Efficient k-NN filtering | Scoring script |
| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering/scoring script |
| 1M | 80 | 100 | Efficient k-NN filtering | Boolean filter |
| 10M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script |
| 10M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 10M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 1M | 2.5 | 100 | Efficient k-NN filtering/Scoring script | Scoring script |
| 1M | 38 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |
| 1M | 80 | 100 | Efficient k-NN filtering | Efficient k-NN filtering |

## Efficient k-NN filtering

Expand Down Expand Up @@ -261,13 +261,16 @@ For more ways to construct a filter, see [Constructing a filter](#constructing-a

### Faiss k-NN filter implementation

Starting with k-NN plugin version 2.9, you can use `faiss` filters for k-NN searches.
For k-NN searches, you can use `faiss` filters with an HNSW algorithm (k-NN plugin versions 2.9 and later) or IVF algorithm (k-NN plugin versions 2.10 and later).

When you specify a Faiss filter for a k-NN search, the Faiss algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The algorithm uses the following variables:

- N: The number of documents in the index.
- P: The number of documents in the document subset after the filter is applied (P <= N).
- k: The maximum number of vectors to return in the response.
- R: The number of results returned after performing the filtered approximate nearest neighbor search.
- FT (filtered threshold): An index-level threshold defined in the [`knn.advanced.filtered_exact_search_threshold` setting]({{site.url}}{{site.baseurl}}/search-plugins/knn/settings/) that specifies to switch to exact search.
- MDC (max distance computations): The maximum number of distance computations allowed in exact search if `FT` (filtered threshold) is not set. This value cannot be changed.

The following flow chart outlines the Faiss algorithm.

Expand Down Expand Up @@ -699,4 +702,4 @@ POST /hotels-index/_search
}
}
```
{% include copy-curl.html %}
{% include copy-curl.html %}
5 changes: 3 additions & 2 deletions _search-plugins/knn/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ Setting | Default | Description
`knn.memory.circuit_breaker.limit` | 50% | The native memory limit for native library indexes. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, the k-NN plugin uses 50% of the remaining 68 GB (34 GB). If memory usage exceeds this value, k-NN removes the least recently used native library indexes.
`knn.memory.circuit_breaker.enabled` | true | Whether to enable the k-NN memory circuit breaker.
`knn.plugin.enabled`| true | Enables or disables the k-NN plugin.
`knn.model.index.number_of_shards`| 1 | Number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate k-NN Search.
`knn.model.index.number_of_replicas`| 1 | Number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability.
`knn.model.index.number_of_shards`| 1 | The number of shards to use for the model system index, the OpenSearch index that stores the models used for Approximate Nearest Neighbor (ANN) search.
`knn.model.index.number_of_replicas`| 1 | The number of replica shards to use for the model system index. Generally, in a multi-node cluster, this should be at least 1 to increase stability.
`knn.advanced.filtered_exact_search_threshold`| null | The threshold value for the filtered IDs that is used to switch to exact search during filtered ANN search. If the number of filtered IDs in a segment is less than this setting's value, exact search will be performed on the filtered IDs.
Binary file modified images/faiss-algorithm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3a23cc6

Please sign in to comment.