-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallel execution of sub-queries for hybrid search #781
Implement parallel execution of sub-queries for hybrid search #781
Conversation
…arch-project#749) Implement parallel execution of sub-queries for hybrid search Add new thread pool to schedule tasks that are related to hybrid query execution Register executor builders with Plugin Use Lucene's Task Executor to execute and collect results Parallelize Query re-write Parallelize score supplier creation Parallelize build hybrid scores Signed-off-by: Vijayan Balasubramanian <[email protected]>
…ject#779) This parallelization is not adding any value after comparing the benchmarks with and without this optimization. Hence removing it. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Client ( large dataset )
Client Configuration ( large dataset )
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix my comment, otherwise code looks good, great job Vijay
src/main/java/org/opensearch/neuralsearch/executors/HybridQueryExecutor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Implement parallel execution of sub-queries for hybrid search (#749) Add new thread pool to schedule tasks that are related to hybrid query execution Register executor builders with Plugin Use Lucene's Task Executor to execute and collect results Parallelize Query re-write Parallelize score supplier creation Parallelize build hybrid scores Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 76090de)
…arch-project#781) * Implement parallel execution of sub-queries for hybrid search (opensearch-project#749) Add new thread pool to schedule tasks that are related to hybrid query execution Register executor builders with Plugin Use Lucene's Task Executor to execute and collect results Parallelize Query re-write Parallelize score supplier creation Parallelize build hybrid scores Signed-off-by: Vijayan Balasubramanian <[email protected]>
… search (#749) (#786) * Implement parallel execution of sub-queries for hybrid search (#781) * Implement parallel execution of sub-queries for hybrid search (#749) Add new thread pool to schedule tasks that are related to hybrid query execution Register executor builders with Plugin Use Lucene's Task Executor to execute and collect results Parallelize Query re-write Parallelize score supplier creation Parallelize build hybrid scores Signed-off-by: Vijayan Balasubramanian <[email protected]> * Update package name in 2.15 which is different from main Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
*/ | ||
public static void initialize(ThreadPool threadPool) { | ||
if (threadPool == null) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this should be an IllegalStateException since the threadPool are with OS system instead of a parameter passed from customer.
boolean actuallyRewritten = rewrittenQuery != query; | ||
return new AbstractMap.SimpleEntry(rewrittenQuery, actuallyRewritten); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems better to use IllegalStateException since it indicates this is an server internal exception, also should add error messages.
try { | ||
return weight.scorerSupplier(leafReaderContext); | ||
} catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same above
private static final Integer HYBRID_QUERY_EXEC_THREAD_POOL_QUEUE_SIZE = 1000; | ||
private static final Integer MAX_THREAD_SIZE = 1000; | ||
private static final Integer MIN_THREAD_SIZE = 2; | ||
private static final Integer PROCESSOR_COUNT_MULTIPLIER = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to multiply the processor? It looks the tasks are all computational intensive, for such tasks more threads may even do harm to the performance, e.g. ForkjoinPool uses processor - 1
as the thread number, did we done testing on this?
Description
Add new thread pool to schedule tasks that are related to hybrid query execution
Register executor builders with Plugin
Use Lucene's Task Executor to execute and collect results
Parallelize Query re-write
Parallelize score supplier creation
Issues Resolved
#279
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.