-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add main classes for Query and basic unit tests #172
Add main classes for Query and basic unit tests #172
Conversation
3b9b049
to
5f07090
Compare
Signed-off-by: Martin Gaievski <[email protected]>
5f07090
to
b4f6717
Compare
Codecov Report
@@ Coverage Diff @@
## feature/normalization #172 +/- ##
===========================================================
- Coverage 89.55% 81.56% -8.00%
- Complexity 103 161 +58
===========================================================
Files 7 11 +4
Lines 316 537 +221
Branches 52 87 +35
===========================================================
+ Hits 283 438 +155
- Misses 16 68 +52
- Partials 17 31 +14
|
* Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual | ||
* scores for each sub-query. | ||
*/ | ||
public class HybridQuery extends Query implements Iterable<Query> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add unit test for HybridQuery and hybrid scorer too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add tests for Query, need more time to check is for Scorer it's possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit test for scorer
/** | ||
* Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual | ||
* scores for each sub-query. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add @opensearch.internal tag on all these classes. These shouldn't be extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opensearch.internal cannot be used, seems it's specific to core OpenSearch repo, and for plugins javadoc tasks throws
error: unknown tag: opensearch.internal
* @opensearch.internal
^
we can use final for classes for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is same here. opensearch.internal
is for OpenSearch core to indicate that the class is not public to plugins.
Signed-off-by: Martin Gaievski <[email protected]>
d6b43bb
to
247d6f4
Compare
In standard scorer.score implementation return sum of all sub-scores as one score for doc id. Fixed unit tests Signed-off-by: Martin Gaievski <[email protected]>
247d6f4
to
6bd9f8e
Compare
private final List<Query> subQueries; | ||
|
||
public HybridQuery(Collection<Query> subQueries) { | ||
Objects.requireNonNull(subQueries, "Collection of Queries must not be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check empty list also here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me add this check. Such empty collect doesn't harm but also does not make sense to process such search request.
} | ||
|
||
if (subQueries.size() == 1) { | ||
return subQueries.iterator().next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont we want to call the rewrite if that query here? We are just returning the query iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why use iterator here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it's an artifact from POC, we need to call rewrite and wrap it into HybridQuery even for 1 sub-query, I'll do the change
public void visit(QueryVisitor queryVisitor) { | ||
QueryVisitor v = queryVisitor.getSubVisitor(BooleanClause.Occur.SHOULD, this); | ||
for (Query q : subQueries) { | ||
q.visit(v); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this visitor can provide some details here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main purpose is to allow visitor pattern for Hybrid query clients in future. It's an abstract method in Query class, we need to provide some implementation. As for the visitor, I've found one example of visitor in core: In TopDocsCollectorContext we do have a visitor that is checking maxScore flag in each sub-query
We can throw UnsupportedOperation for now as I do not recall usage of visitor in other parts of normalization related code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked it, visitors are executed by IndexSearcher, so we need some sort of implementation. Leaving it as is for now.
static void writeQueries(StreamOutput out, List<? extends QueryBuilder> queries) throws IOException { | ||
out.writeVInt(queries.size()); | ||
for (QueryBuilder query : queries) { | ||
out.writeNamedWriteable(query); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this one. Let me send you that.
} | ||
} | ||
|
||
static Collection<Query> toQueries(Collection<QueryBuilder> queryBuilders, QueryShardContext context) throws QueryShardException, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this function is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call it from doToQuery in order to get Query objects from QueryBuilder objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is the case why this is package static private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, not needed, changing to just "private"
|
||
private final HybridQuery queries; | ||
// The Weights for our subqueries, in 1-1 correspondence | ||
protected final ArrayList<Weight> weights; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, can be private
float[] subScores; | ||
|
||
Map<Query, Integer> queryToIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why they are not private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, making private and final
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
820d543
to
5eb6bc3
Compare
Signed-off-by: Martin Gaievski <[email protected]>
5eb6bc3
to
cbd9552
Compare
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* @return | ||
* @throws IOException | ||
*/ | ||
public float[] hybridScores() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how we are going to use it? Since there can be 2 query scores, the return of the float[] is for which query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for later stages, we'll call it from collector. Something like this call in POC code https://github.com/navneet1v/neural-search/blob/normalization-poc/src/main/java/org/opensearch/neuralsearch/search/CompoundTopScoreDocCollector.java#L74.
Array returned here is for all sub-queries, say we do have term1, neural1, term2, then array will be of length 3, all order will correspond to other of sub-queries parsed by query builder, e.g. hybridScores[0] -> term1, hybridScores[1] -> neural1, hybridScores[2] -> term2. For example, we return
for doc id "1": hybridScores: [2.4, 0.676, 0]
for doc id "2": hybridScores: [0.0, 0.576, 1.4]
in this case we have some score for neural1, as k-NN return some score for practically any doc. doc 1 doesn't match term query term2, and doc 2 doesn't match term1, so each has score 0.0 at corresponding index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hybridScores is our custom method, not required for Query phase. I put it here so we can test earlier if scorer is actually getting results for all sub-queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make sure to add more UTs in follow up PRs to cover all the lines,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try stream as much as you can. Declarative style has less error prone.
/** | ||
* Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual | ||
* scores for each sub-query. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is same here. opensearch.internal
is for OpenSearch core to indicate that the class is not public to plugins.
src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try stream as much as you can. Declarative style has less error prone.
Signed-off-by: Martin Gaievski <[email protected]>
c8d945d
to
d170529
Compare
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
8bae248
to
975d80f
Compare
* Add main classes for Query along with basic unit tests Signed-off-by: Martin Gaievski <[email protected]>
Description
Part of Normalization and Score Combination feature.
Adding new query, query builder, scorer and weight classes for new hybrid query. Includes basic unit tests, integ tests are coming in next PRs.
Issues Resolved
#175
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.