Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi vector support for Faiss HNSW - approximate search only #1371

Merged

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Jan 3, 2024

Description

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Note: This is only for approximate search only. Exact search case will be handled in a follow up PR.

Issues Resolved

#1065

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 3, 2024

How we are handling the exact search use case here? I can only see that we have made changes for ANN search. But we also do exact search based on various conditions.

@heemin32
Copy link
Collaborator Author

heemin32 commented Jan 3, 2024

How we are handling the exact search use case here?

Update: Missed to handle the exact search case. Will update the PR.

Hmm. I thought that exact search has no issue with multi vector because it searches every data anyway. Let me verify it again though.

@heemin32 heemin32 changed the title Multi vector support for Faiss HNSW Multi vector support for Faiss HNSW - approximate search only Jan 3, 2024
@jmazanec15
Copy link
Member

@heemin32 Can you look into failing CI?

@heemin32
Copy link
Collaborator Author

heemin32 commented Jan 4, 2024

@heemin32 Can you look into failing CI?

Not related with my change. Will retry.

@jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 self-requested a review January 4, 2024 21:47
@heemin32
Copy link
Collaborator Author

heemin32 commented Jan 4, 2024

@heemin32 actually, this looks like it might be related: https://github.com/opensearch-project/k-NN/actions/runs/7414868454/job/20176854501?pr=1371

Oops. Unused code checked in. Removed it.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, you may want to rebase branch with main to avoid flaky tests

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (e64f762) 85.12% compared to head (4c05bd6) 85.18%.

Files Patch % Lines
...java/org/opensearch/knn/index/query/KNNWeight.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##             feature/multi-vector    #1371      +/-   ##
==========================================================
+ Coverage                   85.12%   85.18%   +0.05%     
- Complexity                   1258     1261       +3     
==========================================================
  Files                         163      163              
  Lines                        5110     5115       +5     
  Branches                      479      479              
==========================================================
+ Hits                         4350     4357       +7     
+ Misses                        555      554       -1     
+ Partials                      205      204       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@heemin32 heemin32 force-pushed the parent-filter branch 2 times, most recently from 7474efc to 4d2e517 Compare January 4, 2024 23:19
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try rebasing the feature branch and your forked repo with main branch as there are some conflicting changes being added to main branch.

jni/include/knn_extension/faiss/utils/Heap.h Show resolved Hide resolved
jni/src/knn_extension/faiss/utils/BitSet.cpp Outdated Show resolved Hide resolved
src/test/java/org/opensearch/knn/index/NestedSearchIT.java Outdated Show resolved Hide resolved
@heemin32
Copy link
Collaborator Author

heemin32 commented Jan 5, 2024

Try rebasing the feature branch and your forked repo with main branch as there are some conflicting changes being added to main branch.

Will do

@heemin32 heemin32 force-pushed the feature/multi-vector branch from df6d1fa to e64f762 Compare January 5, 2024 18:10
@heemin32 heemin32 force-pushed the parent-filter branch 2 times, most recently from 803e5b2 to d1ea68a Compare January 5, 2024 18:29
Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>
@heemin32 heemin32 merged commit 0abed23 into opensearch-project:feature/multi-vector Jan 5, 2024
48 checks passed
@heemin32 heemin32 deleted the parent-filter branch January 5, 2024 19:58
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 16, 2024
Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit that referenced this pull request Jan 16, 2024
Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit that referenced this pull request Jan 19, 2024
Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants