You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It can come up in cases where people say "I want all docs where type is X, Y, Z, or null".
The problem is that the type field in these scenarios is very dense (i.e. almost every doc has a type). As a result, the must_not clause spends a lot of time stepping over docs that do have the field, trying to find the "holes".
By comparison, if the docs with missing type had a value like "typeMissing": "true", that clause could be rewritten as:
Since the "typeMissing": "true" term is very sparse (as the inverse of something very dense), that clause would be extremely cheap.
Describe the solution you'd like
While telling folks to explicitly index a "missing" value works, I'm wondering if there's something we can do to make it easier.
If folks don't index _source, then an _update_by_query to add the missing field isn't going to work, for example. Then they may be stuck resending all the docs with the missing field. Yuck...
Related component
Search:Performance
Describe alternatives you've considered
One thought I had was to add another meta field, like the _field_names field, but for the mapped fields that are not in a given document, maybe _missing_field_names. Then we could detect the negation of a field exists query and turn it into a query on that. It might be a bit messy on docs with nested fields (since the parent and child docs don't have the same mapping).
Another solution could be a way of "materializing" the missing field. The Lucene hacker in me would love to implement a FilterCodecReader that would create the missing term on the fly. Then a merge that wraps segments in that FilterCodecReader could output segments that have the missing term indexed.
Maybe there's another option? More explicit clause caching maybe? (To make sure that we cache the result of the not exists clause?)
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
Recently, I've been involved in a couple of different query latency investigations where the culprit was a clause like:
It can come up in cases where people say "I want all docs where
type
is X, Y, Z, ornull
".The problem is that the
type
field in these scenarios is very dense (i.e. almost every doc has atype
). As a result, themust_not
clause spends a lot of time stepping over docs that do have the field, trying to find the "holes".By comparison, if the docs with missing
type
had a value like"typeMissing": "true"
, that clause could be rewritten as:Since the
"typeMissing": "true"
term is very sparse (as the inverse of something very dense), that clause would be extremely cheap.Describe the solution you'd like
While telling folks to explicitly index a "missing" value works, I'm wondering if there's something we can do to make it easier.
If folks don't index
_source
, then an_update_by_query
to add the missing field isn't going to work, for example. Then they may be stuck resending all the docs with the missing field. Yuck...Related component
Search:Performance
Describe alternatives you've considered
One thought I had was to add another meta field, like the
_field_names
field, but for the mapped fields that are not in a given document, maybe_missing_field_names
. Then we could detect the negation of a field exists query and turn it into a query on that. It might be a bit messy on docs with nested fields (since the parent and child docs don't have the same mapping).Another solution could be a way of "materializing" the missing field. The Lucene hacker in me would love to implement a
FilterCodecReader
that would create the missing term on the fly. Then a merge that wraps segments in thatFilterCodecReader
could output segments that have the missing term indexed.Maybe there's another option? More explicit clause caching maybe? (To make sure that we cache the result of the not exists clause?)
Additional context
No response
The text was updated successfully, but these errors were encountered: