Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallel execution of sub-queries for hybrid search #749
Implement parallel execution of sub-queries for hybrid search #749
Changes from 16 commits
08aa8c8
7f706ee
0bb9947
d90f99e
8ad4743
bb81c27
49383fc
ebd3b24
350a2b4
f08a37c
62be918
7701213
e061a1e
82dabae
f50357d
6d8c080
bc6b885
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
HybridQueryExecutorCollector
thread safe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no setters and only way to update result is by calling collect method. Do you have any particular concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
collect()
could setresult
, ifcollect()
andgetResults()
are called asynchronously, technically, there could be race condition, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Currently getResult() is called from collector manager, where the contract is that it can be called only after all collectors are finished collection. This was taken care in the implementation. I can add this note in collector as well. i am reluctant to add synchronize to result variable, since it will add additional latency for scenario that is not possible at this moment. Do you have any other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment is a good intention. you won't be able to enforce it. You are relying on the hope that the code doesn't change in future to hit that race condition so there will always be a risk.
If you really want to enforce; your options here would be to synchronize or move away from
result collection
toreturning results
as pointed out in this threadThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the overhead of synchronization is way less than we thought, here are some synchronization benchmark articles I found,
One operation of "synchronized" is < 1 micro second, considering the time spending on actual query logic with intensive I/O, this is negligible and you have simple implementation with concurrent situation covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shatejas @chishui Fair enough. I am changing to synchronized, however while benchmarking i am planning to compare with and without synchronization to see that it is not adding any latency to this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @VijayanB comment. I think the purpose of this whole PR is to reduce latency. Therefore, I think @VijayanB it would good if you compare the benchmark results, if you see there is little degradation also in the latency, then we should take a call on this tradeoff whether to keep synchronized or not.