-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Query phase searcher #204
Add Query phase searcher #204
Conversation
Codecov Report
@@ Coverage Diff @@
## feature/normalization #204 +/- ##
===========================================================
+ Coverage 84.57% 84.91% +0.33%
- Complexity 198 215 +17
===========================================================
Files 14 15 +1
Lines 629 676 +47
Branches 103 109 +6
===========================================================
+ Hits 532 574 +42
+ Misses 61 60 -1
- Partials 36 42 +6
|
Signed-off-by: Martin Gaievski <[email protected]>
db56ae4
to
3955bfc
Compare
@@ -189,6 +189,14 @@ public static HybridQueryBuilder fromXContent(XContentParser parser) throws IOEx | |||
} else { | |||
if (AbstractQueryBuilder.BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { | |||
boost = parser.floatValue(); | |||
// regular boost functionality is not supported, user should use score normalization methods to manipulate with scores | |||
if (boost != DEFAULT_BOOST) { | |||
log.error(String.format(Locale.ROOT, "[%s] query does not support [%s]", NAME, BOOST_FIELD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do both logging and throwing exception. Throwing an exception should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea was to log more info including actual value of boost the user has provided (due to security risk of mirroring user input to the output). I forgot to add boost value to the exception message, will do the change in next commit
src/main/java/org/opensearch/neuralsearch/query/HybridQueryBuilder.java
Outdated
Show resolved
Hide resolved
if (compoundScores == null) { | ||
return new ArrayList<>(); | ||
} | ||
topDocs = new ArrayList(compoundScores.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topDocs = new ArrayList(compoundScores.length); | |
List<TopDocs> topDocs = new ArrayList(compoundScores.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
src/main/java/org/opensearch/neuralsearch/search/HybridTopScoreDocCollector.java
Outdated
Show resolved
Hide resolved
|
||
private Function<List<TopDocs>, TotalHits> totalHitsSupplier; | ||
private Function<List<TopDocs>, Float> maxScoreSupplier; | ||
protected SortAndFormats sortAndFormats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be artifact from POC code, will change in next commit
src/test/java/org/opensearch/neuralsearch/common/BaseNeuralSearchIT.java
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/query/HybridQueryBuilderTests.java
Show resolved
Hide resolved
|
||
Map<String, Object> total = getTotalHits(searchResponseAsMap1); | ||
assertNotNull(total.get("value")); | ||
assertEquals(3, total.get("value")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have 3
as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -139,6 +141,22 @@ protected static Document getDocument(String fieldName, int docId, String fieldV | |||
return doc; | |||
} | |||
|
|||
protected static Document getTextAndVectorDocument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we use this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not used, will remove it
when(mockQueryShardContext.fieldMapper(eq(TEXT_FIELD_NAME))).thenReturn(fieldType); | ||
|
||
Directory directory = newDirectory(); | ||
final IndexWriter w = new IndexWriter(directory, newIndexWriterConfig(new MockAnalyzer(random()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some are final and others are not final. There is no consistency. Could be better to not to use final at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
4941650
to
e0581b3
Compare
Signed-off-by: Martin Gaievski <[email protected]>
e0581b3
to
fd9c700
Compare
// regular boost functionality is not supported, user should use score normalization methods to manipulate with scores | ||
if (boost != DEFAULT_BOOST) { | ||
log.error( | ||
String.format(Locale.ROOT, "[%s] query does not support provided value %.4f for [%s]", NAME, boost, BOOST_FIELD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just add the boost value in exception message itself and remove this logging?
String.format(Locale.ROOT, "[%s] query does not support provided value %.4f for [%s]", NAME, boost, BOOST_FIELD) | |
"[{}] query does not support provided value {} for [{}]", NAME, boost, BOOST_FIELD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, as for the more detailed error message in exception, I rather not do it, got recommendation several times from security team not to put anything user provided to the user facing content, say if this exception is propagated to UI something like this can cause execution of malicious script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. However, we already validated the input is floatValue in this case right? There will be security risk only when we return not-validated customer input.
Also, there won't be much benefit of logging boost value here as well imo.
|
||
final IndexReader reader = searchContext.searcher().getIndexReader(); | ||
int totalNumDocs = Math.max(0, reader.numDocs()); | ||
if (searchContext.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you mean this line should be called no matter what?
final TopDocsCollectorContext topDocsFactory = createTopDocsCollectorContext(searchContext, hasFilterCollector);
collectors.addFirst(topDocsFactory);
return new TotalHits(maxTotalHits, relation); | ||
} | ||
|
||
private float getMaxScore(List<TopDocs> topDocs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private float getMaxScore(List<TopDocs> topDocs) { | |
private float getMaxScore(final List<TopDocs> topDocs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you mean this line should be called no matter what?
Yes, createTopDocsCollectorContext is core method, they may mutate searchContext
. E.g. query result object is set in such a way, it's better keep this sequence similar to what it's in core
Signed-off-by: Martin Gaievski <[email protected]>
* Add query phase searcher and basic tests Signed-off-by: Martin Gaievski <[email protected]> --------- Signed-off-by: Martin Gaievski <[email protected]>
Description
Adding Query Phase Searcher required for Normalization and Score Combination feature.
Searcher is registered on plugin level, it calls previously created doc collector and collect docs and scores and updated Query Result in SearchContext. This allows search results to be visible in later phases for post processors.
Adding integ tests, it's not complete end to end as post processor is coming later, but it allows to execute Hybrid Query on cluster with data.
Issues Resolved
#194
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.