[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

navneet1v · 2024-12-11T20:32:10Z

Description

Currently at a segment level when vector search is happening there are multiple cases when we move from ANN search to exact search. Cases like when filter threshold hit or if there are no graphs present on the segment due to graph creation threshold is not met for a segment. In all these cases we do exact search but while doing exact search we serially iterate over valid docIds, read the vectors and then compute the score.

Possible Optimizations

One possible optimization we can do is to actually parallelize this, which can give a reduce the latency.

Some ideas can be:

We can actually gather all the docIds in an array and then let multiple threads consume those docIds with their VectorValues to read the vector and compute the scores in parallel.
We can also first read the vectors and docsIds in heap and then compute scores in parallel.
We can have a VectorValuesIterator that is thread safe and each thread can read from single iterator till the iterator reaches NO_MORE_DOCS. Thanks to @heemin32 for suggesting this.

I think 1 and 3, should be better over 2 as in that case we don't need to read all vectors in heap and put more stress on heap. Looking for feedbacks here.

But during implementation we can see which one we should choose.

Similar thing can be done for the script score based exact search but that will need some more thinking. Will try to update add that too.

heemin32 · 2024-12-11T22:00:59Z

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

navneet1v · 2024-12-11T22:16:02Z

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

heemin32 · 2024-12-11T23:21:12Z

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

The thread will process whatever available next doc id from the iterator until they meet no_more_doc.

navneet1v · 2024-12-11T23:42:14Z

We could have custom thread safe DocIdSetsIterator and let multiple thread to share the iterator to process.

Yes we can but we need a way to tell threads from which docId to which Id it needs to do the scoring. Hence I was thinking if we just accumulate DocIds in batches and then keep on handing over to threads for scoring while we iterate over the docIds that might be easy to implement. But I like the suggestion.

The thread will process whatever available next doc id from the iterator until they meet no_more_doc.

Ahhh, now i know what you saying. This is pretty interesting. I think we can build such iterator with some locks and synchronized blocks. I think this will be more easy to build and validate. Thanks @heemin32 for the suggestion. May be there can be other ways to optimize the whole code but I like what you are thinking.

navneet1v added untriaged enhancement labels Dec 11, 2024

navneet1v added this to Vector Search RoadMap Dec 11, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

navneet1v commented Dec 11, 2024 •

edited

Loading

heemin32 commented Dec 11, 2024

navneet1v commented Dec 11, 2024

heemin32 commented Dec 11, 2024

navneet1v commented Dec 11, 2024 •

edited

Loading

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

[FEATURE] Parallelize Exact Search for vector indices in a segment #2326

Comments

navneet1v commented Dec 11, 2024 • edited Loading

Description

Possible Optimizations

heemin32 commented Dec 11, 2024

navneet1v commented Dec 11, 2024

heemin32 commented Dec 11, 2024

navneet1v commented Dec 11, 2024 • edited Loading

navneet1v commented Dec 11, 2024 •

edited

Loading

navneet1v commented Dec 11, 2024 •

edited

Loading