Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IT and BWC tests with Indices containing both Vector and Non Vector documents #2284

Open
navneet1v opened this issue Nov 21, 2024 · 0 comments
Labels
Infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. Maintenance Add support for new versions of OpenSearch/Dashboards from upstream

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Nov 21, 2024

Description

Currently in k-NN plugin all the ITs and BWC created has indices with vector fields and all the documents contain vector field. But in production indices it is not necessary that a k-NN index documents will always have the vector field in it or to say all the vector fields in it. Due to these kind of tests being missing we are not able to catch issues which are fixed in these PRs:

  1. NPE exception during Disk based vector search due to segment not containing a vector field. Ref: [BUG] NPE while calling ANN search when deleted docs or a segment with no vector field present in the index #2277
  2. The feature of releasing the memory during closing of the index introduced a bug where if a segment has a knn_vector field but no docs with this field present, then an index OOB exception will be thrown. This was fixed in Remove FileWatcher from KNN. #2182.
Caused by: NotSerializableExceptionWrapper[index_out_of_bounds_exception: Index 0 out of bounds for length 0]
    at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
    at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
    at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
    at java.util.Objects.checkIndex(Objects.java:385)
    at java.util.ArrayList.get(ArrayList.java:427)
    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesProducer.<init>(KNN80DocValuesProducer.java:78)
    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesFormat.fieldsProducer(KNN80DocValuesFormat.java:44)
    at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:52)

Proposal

To catch the above issues during PRs we should add tests(BWC and ITs) for all 3 engines and disk based vector search. For 1, I added the integration tests with the fix

public void testCompressionIndexWithNonVectorFieldsSegment_whenValid_ThenSucceed() {
. We need to similar thing for BWC and other engines.

Tests to be added

  1. BWC test for all versions where an index has 10 docs where 9 contain vector fields and 1 is no vector field. The ingestion should happen such that document with no vector field gets its own segment. ref proposal section.
  2. Similar to BWC we should have ITs that cover these scenario for an index created similar to step 1
    1. Vector search with Faiss
    2. Filters tests with Faiss.
    3. Disk based vector search with default compression
    4. Lucene engine tests

Please suggest more tests if there are any.

@navneet1v navneet1v added Infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. and removed untriaged labels Nov 21, 2024
@navneet1v navneet1v moved this from Backlog to 2.19.0 in Vector Search RoadMap Nov 21, 2024
@navneet1v navneet1v added the Maintenance Add support for new versions of OpenSearch/Dashboards from upstream label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. Maintenance Add support for new versions of OpenSearch/Dashboards from upstream
Projects
Status: 2.19.0
Development

No branches or pull requests

1 participant