Add main classes for Query and basic unit tests #172

martin-gaievski · 2023-05-17T00:43:03Z

Description

Part of Normalization and Score Combination feature.
Adding new query, query builder, scorer and weight classes for new hybrid query. Includes basic unit tests, integ tests are coming in next PRs.

Issues Resolved

#175

Check List

New functionality includes testing.
- All tests pass
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Martin Gaievski <[email protected]>

codecov · 2023-05-17T19:05:15Z

Codecov Report

Merging #172 (975d80f) into feature/normalization (5feefd5) will decrease coverage by 8.00%.
The diff coverage is 69.36%.

@@                     Coverage Diff                     @@
##             feature/normalization     #172      +/-   ##
===========================================================
- Coverage                    89.55%   81.56%   -8.00%     
- Complexity                     103      161      +58     
===========================================================
  Files                            7       11       +4     
  Lines                          316      537     +221     
  Branches                        52       87      +35     
===========================================================
+ Hits                           283      438     +155     
- Misses                          16       68      +52     
- Partials                        17       31      +14

Impacted Files	Coverage Δ
...g/opensearch/neuralsearch/plugin/NeuralSearch.java	`0.00% <0.00%> (ø)`
...ensearch/neuralsearch/query/HybridQueryWeight.java	`52.00% <52.00%> (ø)`
...nsearch/neuralsearch/query/HybridQueryBuilder.java	`65.04% <65.04%> (ø)`
...org/opensearch/neuralsearch/query/HybridQuery.java	`77.77% <77.77%> (ø)`
...ensearch/neuralsearch/query/HybridQueryScorer.java	`81.25% <81.25%> (ø)`

... and 1 file with indirect coverage changes

navneet1v · 2023-05-17T19:31:12Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+ * Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual
+ * scores for each sub-query.
+ */
+public class HybridQuery extends Query implements Iterable<Query> {


lets add unit test for HybridQuery and hybrid scorer too.

We should be able to add tests for Query, need more time to check is for Scorer it's possible

Added unit test for scorer

navneet1v · 2023-05-17T19:31:42Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+/**
+ * Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual
+ * scores for each sub-query.
+ */


lets add @opensearch.internal tag on all these classes. These shouldn't be extended.

opensearch.internal cannot be used, seems it's specific to core OpenSearch repo, and for plugins javadoc tasks throws

error: unknown tag: opensearch.internal * @opensearch.internal ^

we can use final for classes for now

My understanding is same here. opensearch.internal is for OpenSearch core to indicate that the class is not public to plugins.

Signed-off-by: Martin Gaievski <[email protected]>

In standard scorer.score implementation return sum of all sub-scores as one score for doc id. Fixed unit tests Signed-off-by: Martin Gaievski <[email protected]>

navneet1v · 2023-05-20T02:38:52Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+    private final List<Query> subQueries;
+
+    public HybridQuery(Collection<Query> subQueries) {
+        Objects.requireNonNull(subQueries, "Collection of Queries must not be null");


Should we check empty list also here?

yes, let me add this check. Such empty collect doesn't harm but also does not make sense to process such search request.

navneet1v · 2023-05-20T02:40:29Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+        }
+
+        if (subQueries.size() == 1) {
+            return subQueries.iterator().next();


Dont we want to call the rewrite if that query here? We are just returning the query iterator?

Also why use iterator here?

Good catch, it's an artifact from POC, we need to call rewrite and wrap it into HybridQuery even for 1 sub-query, I'll do the change

navneet1v · 2023-05-20T02:47:27Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+    public void visit(QueryVisitor queryVisitor) {
+        QueryVisitor v = queryVisitor.getSubVisitor(BooleanClause.Occur.SHOULD, this);
+        for (Query q : subQueries) {
+            q.visit(v);
+        }
+    }


What is the purpose of this visitor can provide some details here?

Main purpose is to allow visitor pattern for Hybrid query clients in future. It's an abstract method in Query class, we need to provide some implementation. As for the visitor, I've found one example of visitor in core: In TopDocsCollectorContext we do have a visitor that is checking maxScore flag in each sub-query
We can throw UnsupportedOperation for now as I do not recall usage of visitor in other parts of normalization related code.

Checked it, visitors are executed by IndexSearcher, so we need some sort of implementation. Leaving it as is for now.

navneet1v · 2023-05-20T02:52:15Z

src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java

+    static void writeQueries(StreamOutput out, List<? extends QueryBuilder> queries) throws IOException {
+        out.writeVInt(queries.size());
+        for (QueryBuilder query : queries) {
+            out.writeNamedWriteable(query);
+        }
+    }


Not this one. Let me send you that.

navneet1v · 2023-05-20T02:52:51Z

src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java

+        }
+    }
+
+    static Collection<Query> toQueries(Collection<QueryBuilder> queryBuilders, QueryShardContext context) throws QueryShardException,


Where does this function is used?

We call it from doToQuery in order to get Query objects from QueryBuilder objects.

if that is the case why this is package static private?

right, not needed, changing to just "private"

navneet1v · 2023-05-22T06:29:00Z

src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java

+
+    private final HybridQuery queries;
+    // The Weights for our subqueries, in 1-1 correspondence
+    protected final ArrayList<Weight> weights;


why protected?

ack, can be private

navneet1v · 2023-05-22T15:14:57Z

src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java

+    float[] subScores;
+
+    Map<Query, Integer> queryToIndex;


why they are not private?

ack, making private and final

src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java

src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java

Signed-off-by: Martin Gaievski <[email protected]>

navneet1v · 2023-05-30T17:22:01Z

src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java

+     * @return
+     * @throws IOException
+     */
+    public float[] hybridScores() throws IOException {


I am not sure how we are going to use it? Since there can be 2 query scores, the return of the float[] is for which query?

This is required for later stages, we'll call it from collector. Something like this call in POC code https://github.com/navneet1v/neural-search/blob/normalization-poc/src/main/java/org/opensearch/neuralsearch/search/CompoundTopScoreDocCollector.java#L74.

Array returned here is for all sub-queries, say we do have term1, neural1, term2, then array will be of length 3, all order will correspond to other of sub-queries parsed by query builder, e.g. hybridScores[0] -> term1, hybridScores[1] -> neural1, hybridScores[2] -> term2. For example, we return

for doc id "1": hybridScores: [2.4, 0.676, 0]
for doc id "2": hybridScores: [0.0, 0.576, 1.4]

in this case we have some score for neural1, as k-NN return some score for practically any doc. doc 1 doesn't match term query term2, and doc 2 doesn't match term1, so each has score 0.0 at corresponding index.

hybridScores is our custom method, not required for Query phase. I put it here so we can test earlier if scorer is actually getting results for all sub-queries.

navneet1v

Lets make sure to add more UTs in follow up PRs to cover all the lines,

heemin32

Try stream as much as you can. Declarative style has less error prone.

src/main/java/org/opensearch/neuralsearch/plugin/NeuralSearch.java

heemin32 · 2023-05-31T17:38:56Z

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

+/**
+ * Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual
+ * scores for each sub-query.
+ */


My understanding is same here. opensearch.internal is for OpenSearch core to indicate that the class is not public to plugins.

src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java

src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java

src/main/java/org/opensearch/neuralsearch/query/HybridQueryWeight.java

heemin32

Try stream as much as you can. Declarative style has less error prone.

Signed-off-by: Martin Gaievski <[email protected]>

* Add main classes for Query along with basic unit tests Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch 3 times, most recently from 3b9b049 to 5f07090 Compare May 17, 2023 18:54

Add main classes for Query along with basic unit tests

b4f6717

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch from 5f07090 to b4f6717 Compare May 17, 2023 18:58

martin-gaievski changed the title ~~[Normalization] Add main classes for Query along with basic unit tests~~ [Normalization] Add main classes for Query and basic unit tests May 17, 2023

martin-gaievski changed the base branch from feature/normalization to main May 17, 2023 19:05

martin-gaievski changed the base branch from main to feature/normalization May 17, 2023 19:06

martin-gaievski changed the base branch from feature/normalization to main May 17, 2023 19:14

martin-gaievski changed the base branch from main to feature/normalization May 17, 2023 19:14

martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement skip-changelog and removed Features Introduces a new unit of functionality that satisfies a requirement labels May 17, 2023

martin-gaievski marked this pull request as ready for review May 17, 2023 19:17

martin-gaievski requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, sean-zheng-amazon, model-collapse, wujunshen, zane-neo, ylwu-amzn and jngz-es as code owners May 17, 2023 19:17

navneet1v reviewed May 17, 2023

View reviewed changes

Adding hybrid score collection and unit test

8624b19

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch 4 times, most recently from d6b43bb to 247d6f4 Compare May 22, 2023 05:01

Implement scorer.score as sum of all sub-scores

6bd9f8e

In standard scorer.score implementation return sum of all sub-scores as one score for doc id. Fixed unit tests Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch from 247d6f4 to 6bd9f8e Compare May 22, 2023 05:25

navneet1v reviewed May 22, 2023

View reviewed changes

martin-gaievski force-pushed the feature/normalization-query branch from 820d543 to 5eb6bc3 Compare May 22, 2023 20:37

Add check for empty sub-queries, fix rewrite logic

cbd9552

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch from 5eb6bc3 to cbd9552 Compare May 22, 2023 20:37

martin-gaievski added 3 commits May 22, 2023 14:13

Multiple minor refactorings

4d31256

Signed-off-by: Martin Gaievski <[email protected]>

Minor refactoring

700168b

Signed-off-by: Martin Gaievski <[email protected]>

Use more generic functions for stream read/write

40319b0

Signed-off-by: Martin Gaievski <[email protected]>

navneet1v reviewed May 30, 2023

View reviewed changes

martin-gaievski requested a review from navneet1v May 30, 2023 22:45

navneet1v approved these changes May 31, 2023

View reviewed changes

heemin32 reviewed May 31, 2023

View reviewed changes

This comment was marked as duplicate.

Sign in to view

Review comments

d170529

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch from c8d945d to d170529 Compare May 31, 2023 22:08

martin-gaievski requested a review from heemin32 May 31, 2023 22:12

martin-gaievski added 2 commits May 31, 2023 20:45

More review comments

a68a318

Signed-off-by: Martin Gaievski <[email protected]>

Use streams more, create new iterator on every call

975d80f

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the feature/normalization-query branch from 8bae248 to 975d80f Compare June 1, 2023 20:28

heemin32 approved these changes Jun 1, 2023

View reviewed changes

martin-gaievski merged commit 99ec3b4 into opensearch-project:feature/normalization Jun 1, 2023

martin-gaievski added a commit that referenced this pull request Aug 3, 2023

Add main classes for Query and basic unit tests (#172)

e81262c

* Add main classes for Query along with basic unit tests Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski mentioned this pull request Aug 3, 2023

Added Score Normalization and Combination feature #241

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add main classes for Query and basic unit tests #172

Add main classes for Query and basic unit tests #172

martin-gaievski commented May 17, 2023 •

edited

Loading

codecov bot commented May 17, 2023 •

edited

Loading

navneet1v May 17, 2023

martin-gaievski May 18, 2023

martin-gaievski May 19, 2023

navneet1v May 17, 2023

martin-gaievski May 18, 2023

martin-gaievski May 22, 2023

heemin32 May 31, 2023

navneet1v May 20, 2023

martin-gaievski May 22, 2023

navneet1v May 20, 2023

navneet1v May 20, 2023

martin-gaievski May 22, 2023

navneet1v May 20, 2023

martin-gaievski May 22, 2023

martin-gaievski May 22, 2023

navneet1v May 20, 2023

navneet1v May 20, 2023

martin-gaievski May 22, 2023

navneet1v May 23, 2023

martin-gaievski May 23, 2023

navneet1v May 22, 2023

martin-gaievski May 22, 2023

navneet1v May 22, 2023

martin-gaievski May 22, 2023

navneet1v May 30, 2023

martin-gaievski May 30, 2023 •

edited

Loading

martin-gaievski May 30, 2023

navneet1v left a comment

heemin32 left a comment

heemin32 May 31, 2023

heemin32 left a comment

This comment was marked as duplicate.

Add main classes for Query and basic unit tests #172

Add main classes for Query and basic unit tests #172

Conversation

martin-gaievski commented May 17, 2023 • edited Loading

Description

Issues Resolved

Check List

codecov bot commented May 17, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martin-gaievski May 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

navneet1v left a comment

Choose a reason for hiding this comment

heemin32 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heemin32 left a comment

Choose a reason for hiding this comment

This comment was marked as duplicate.

martin-gaievski commented May 17, 2023 •

edited

Loading

codecov bot commented May 17, 2023 •

edited

Loading

martin-gaievski May 30, 2023 •

edited

Loading