Add inner hits support to hybrid query #776

martin-gaievski · 2024-06-06T04:09:59Z

Description

Adding support for inner hits to hybrid query. This is a feature of OpenSearch that is available for other queries but was not supported by hybrid query.

Inner hits will be tracked similarly to how they are tracked for all other queries. They will contain details of inner hits for cases of nested fields and parent/child relationships between documents. The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

Following are example of response that contains such inner hits section for nested and parent/child queries:

{
    "took": 79,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.540445,
        "hits": [
            {
                "_index": "index-test",
                "_id": "Sogqp48BjNYyAI8a4z9u",
                "_score": 1.540445,
                "_source": {
                    "doc_price": 100,
                    "doc_index": 4976,
                    "doc_location": {
                        "coordinates": [
                            [
                                -111.15,
                                45.12
                            ],
                            [
                                -109.83,
                                44.12
                            ]
                        ],
                        "type": "envelope"
                    },
                    "doc_location_2": "81.15, 44.12",
                    "doc_date": "02/03/2014",
                    "doc_point": {
                        "lon": 74.0,
                        "lat": 40.71
                    },
                    "id": "7ebe00c8-9858-11ee-b9d1-0242ac120002",
                    "doc_keyword": "workable",
                    "category": "permission",
                    "title": "Writing a list of random sentences is harder than I initially thought it would be.",
                    "user": {
                        "firstname": "john",
                        "age": 1,
                        "lastname": "black"
                    }
                },
                "inner_hits": {
                    "user": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.540445,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "Sogqp48BjNYyAI8a4z9u",
                                    "_nested": {
                                        "field": "user",
                                        "offset": 0
                                    },
                                    "_score": 1.540445,
                                    "_source": {
                                        "firstname": "john",
                                        "age": 1,
                                        "lastname": "black"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

{
    "took": 134,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "index-test",
                "_id": "10",
                "_score": 1.0,
                "_routing": "1",
                "_source": {
                    "my_id": "10",
                    "text": "This is an answer",
                    "my_join_field": {
                        "name": "answer",
                        "parent": "5"
                    }
                },
                "inner_hits": {
                    "question": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.2039728,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "5",
                                    "_score": 1.2039728,
                                    "_source": {
                                        "my_id": "5",
                                        "text": "This is a question",
                                        "my_join_field": "question"
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index": "index-test",
                "_id": "11",
                "_score": 1.0,
                "_routing": "1",
                "_source": {
                    "my_id": "11",
                    "text": "This is second answer",
                    "my_join_field": {
                        "name": "answer",
                        "parent": "5"
                    }
                },
                "inner_hits": {
                    "question": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 1.2039728,
                            "hits": [
                                {
                                    "_index": "index-test",
                                    "_id": "5",
                                    "_score": 1.2039728,
                                    "_source": {
                                        "my_id": "5",
                                        "text": "This is a question",
                                        "my_join_field": "question"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Issues Resolved

#718

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Martin Gaievski <[email protected]>

navneet1v · 2024-06-07T06:54:32Z

The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

@martin-gaievski do we have a path forward for resolving this technical challenge? Have we started the discussion around this with Core team.

Also, I would like to take a step back here and question what is the meaning of normalized score for inner hits?

navneet1v · 2024-06-07T06:55:02Z

@martin-gaievski is this feature scoped for 2.15?

martin-gaievski · 2024-06-07T15:26:01Z

@martin-gaievski is this feature scoped for 2.15?

There is no hard requirement for the version

src/test/java/org/opensearch/neuralsearch/query/HybridQueryBuilderTests.java

src/test/java/org/opensearch/neuralsearch/query/HybridQueryIT.java

martin-gaievski · 2024-06-07T18:54:14Z

The only catch is the score of the inner hit—such scores will be before normalization. Having a normalized score is technically difficult because inner hits processing is done in the Fetch phase, which occurs after the normalization processor has finished its work.

@martin-gaievski do we have a path forward for resolving this technical challenge? Have we started the discussion around this with Core team.

Also, I would like to take a step back here and question what is the meaning of normalized score for inner hits?

For now no path clear forward, I'll be working on summarizing technical hurdles we do have. Short list is:

inner hits are collected in two steps, query phase and fetch phase
there is a separate query and query phase builder for inner hits. this inner query builder calls core TopDocsCollector to get scores.
hits are collected as part of the fetch phase, it's after or normalization processor run, so we cannot manipulate with inner scores and normalize them as part of existing processor
there is no way in core today to skip score calculation at fetch phase, so whatever we can came up with in normalization will be override in fetch phase

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski · 2024-07-29T16:09:12Z

Closing for now as this needs some additional investigation

Initial version, inner hits work but scores are not normalized

c0f4041

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Jun 6, 2024

martin-gaievski changed the title ~~Add inner_hits to hybrid query~~ Add inner hits support to hybrid query Jun 6, 2024

martin-gaievski force-pushed the poc_inner_hits_in_hybrid_query branch 3 times, most recently from b5ef151 to f4804af Compare June 7, 2024 00:52

Adding inner hits support for hybrid query

7889caa

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the poc_inner_hits_in_hybrid_query branch from f4804af to 7889caa Compare June 7, 2024 01:03

martin-gaievski marked this pull request as ready for review June 7, 2024 02:38

martin-gaievski requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, sean-zheng-amazon, model-collapse, zane-neo, ylwu-amzn, jngz-es, vibrantvarun and zhichao-aws as code owners June 7, 2024 02:38

martin-gaievski added Enhancements Increases software capabilities beyond original client specifications and removed Features Introduces a new unit of functionality that satisfies a requirement labels Jun 7, 2024

martin-gaievski closed this Jun 7, 2024

martin-gaievski reopened this Jun 7, 2024

shatejas reviewed Jun 7, 2024

View reviewed changes

src/test/java/org/opensearch/neuralsearch/query/HybridQueryBuilderTests.java Show resolved Hide resolved

src/test/java/org/opensearch/neuralsearch/query/HybridQueryIT.java Outdated Show resolved Hide resolved

Refactor test method for better readability

895ae31

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the poc_inner_hits_in_hybrid_query branch from e475041 to 895ae31 Compare June 8, 2024 06:34

martin-gaievski closed this Jul 29, 2024

yuye-aws mentioned this pull request Sep 10, 2024

[FEATURE] Hybrid request does not return inner_hits for nested objects. #718

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inner hits support to hybrid query #776

Add inner hits support to hybrid query #776

martin-gaievski commented Jun 6, 2024 •

edited

Loading

navneet1v commented Jun 7, 2024

navneet1v commented Jun 7, 2024

martin-gaievski commented Jun 7, 2024 •

edited

Loading

martin-gaievski commented Jun 7, 2024

martin-gaievski commented Jul 29, 2024

Add inner hits support to hybrid query #776

Add inner hits support to hybrid query #776

Conversation

martin-gaievski commented Jun 6, 2024 • edited Loading

Description

Issues Resolved

Check List

navneet1v commented Jun 7, 2024

navneet1v commented Jun 7, 2024

martin-gaievski commented Jun 7, 2024 • edited Loading

martin-gaievski commented Jun 7, 2024

martin-gaievski commented Jul 29, 2024

martin-gaievski commented Jun 6, 2024 •

edited

Loading

martin-gaievski commented Jun 7, 2024 •

edited

Loading