Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Query phase searcher #204

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Jun 27, 2023

Description

Adding Query Phase Searcher required for Normalization and Score Combination feature.
Searcher is registered on plugin level, it calls previously created doc collector and collect docs and scores and updated Query Result in SearchContext. This allows search results to be visible in later phases for post processors.

Adding integ tests, it's not complete end to end as post processor is coming later, but it allows to execute Hybrid Query on cluster with data.

Issues Resolved

#194

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Merging #204 (e1d0695) into feature/normalization (2de63dd) will increase coverage by 0.33%.
The diff coverage is 74.60%.

@@                     Coverage Diff                     @@
##             feature/normalization     #204      +/-   ##
===========================================================
+ Coverage                    84.57%   84.91%   +0.33%     
- Complexity                     198      215      +17     
===========================================================
  Files                           14       15       +1     
  Lines                          629      676      +47     
  Branches                       103      109       +6     
===========================================================
+ Hits                           532      574      +42     
+ Misses                          61       60       -1     
- Partials                        36       42       +6     
Impacted Files Coverage Δ
...euralsearch/search/HybridTopScoreDocCollector.java 79.62% <60.00%> (+3.43%) ⬆️
...lsearch/search/query/HybridQueryPhaseSearcher.java 73.58% <73.58%> (ø)
...g/opensearch/neuralsearch/plugin/NeuralSearch.java 71.42% <100.00%> (+38.09%) ⬆️
...nsearch/neuralsearch/query/HybridQueryBuilder.java 72.64% <100.00%> (+2.73%) ⬆️
...pensearch/neuralsearch/search/CompoundTopDocs.java 100.00% <100.00%> (+5.55%) ⬆️

@martin-gaievski martin-gaievski force-pushed the feature/normalization-query-phase-searcher-query-searcher branch from db56ae4 to 3955bfc Compare June 28, 2023 01:22
@@ -189,6 +189,14 @@ public static HybridQueryBuilder fromXContent(XContentParser parser) throws IOEx
} else {
if (AbstractQueryBuilder.BOOST_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
boost = parser.floatValue();
// regular boost functionality is not supported, user should use score normalization methods to manipulate with scores
if (boost != DEFAULT_BOOST) {
log.error(String.format(Locale.ROOT, "[%s] query does not support [%s]", NAME, BOOST_FIELD));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do both logging and throwing exception. Throwing an exception should be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was to log more info including actual value of boost the user has provided (due to security risk of mirroring user input to the output). I forgot to add boost value to the exception message, will do the change in next commit

if (compoundScores == null) {
return new ArrayList<>();
}
topDocs = new ArrayList(compoundScores.length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
topDocs = new ArrayList(compoundScores.length);
List<TopDocs> topDocs = new ArrayList(compoundScores.length);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


private Function<List<TopDocs>, TotalHits> totalHitsSupplier;
private Function<List<TopDocs>, Float> maxScoreSupplier;
protected SortAndFormats sortAndFormats;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this protected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be artifact from POC code, will change in next commit


Map<String, Object> total = getTotalHits(searchResponseAsMap1);
assertNotNull(total.get("value"));
assertEquals(3, total.get("value"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have 3 as a variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -139,6 +141,22 @@ protected static Document getDocument(String fieldName, int docId, String fieldV
return doc;
}

protected static Document getTextAndVectorDocument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we use this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not used, will remove it

when(mockQueryShardContext.fieldMapper(eq(TEXT_FIELD_NAME))).thenReturn(fieldType);

Directory directory = newDirectory();
final IndexWriter w = new IndexWriter(directory, newIndexWriterConfig(new MockAnalyzer(random())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some are final and others are not final. There is no consistency. Could be better to not to use final at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@martin-gaievski martin-gaievski force-pushed the feature/normalization-query-phase-searcher-query-searcher branch from 4941650 to e0581b3 Compare June 28, 2023 20:06
@martin-gaievski martin-gaievski force-pushed the feature/normalization-query-phase-searcher-query-searcher branch from e0581b3 to fd9c700 Compare June 28, 2023 20:56
@martin-gaievski martin-gaievski requested a review from heemin32 June 28, 2023 21:11
// regular boost functionality is not supported, user should use score normalization methods to manipulate with scores
if (boost != DEFAULT_BOOST) {
log.error(
String.format(Locale.ROOT, "[%s] query does not support provided value %.4f for [%s]", NAME, boost, BOOST_FIELD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just add the boost value in exception message itself and remove this logging?

Suggested change
String.format(Locale.ROOT, "[%s] query does not support provided value %.4f for [%s]", NAME, boost, BOOST_FIELD)
"[{}] query does not support provided value {} for [{}]", NAME, boost, BOOST_FIELD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, as for the more detailed error message in exception, I rather not do it, got recommendation several times from security team not to put anything user provided to the user facing content, say if this exception is propagated to UI something like this can cause execution of malicious script.

Copy link
Collaborator

@heemin32 heemin32 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. However, we already validated the input is floatValue in this case right? There will be security risk only when we return not-validated customer input.

Also, there won't be much benefit of logging boost value here as well imo.


final IndexReader reader = searchContext.searcher().getIndexReader();
int totalNumDocs = Math.max(0, reader.numDocs());
if (searchContext.size() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you mean this line should be called no matter what?

final TopDocsCollectorContext topDocsFactory = createTopDocsCollectorContext(searchContext, hasFilterCollector);
collectors.addFirst(topDocsFactory);

return new TotalHits(maxTotalHits, relation);
}

private float getMaxScore(List<TopDocs> topDocs) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private float getMaxScore(List<TopDocs> topDocs) {
private float getMaxScore(final List<TopDocs> topDocs) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you mean this line should be called no matter what?
Yes, createTopDocsCollectorContext is core method, they may mutate searchContext. E.g. query result object is set in such a way, it's better keep this sequence similar to what it's in core

Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski merged commit 7dda0c5 into feature/normalization Jun 29, 2023
martin-gaievski added a commit that referenced this pull request Aug 3, 2023
* Add query phase searcher and basic tests

Signed-off-by: Martin Gaievski <[email protected]>

---------

Signed-off-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants