Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] More efficiently handle NOT exists(...) queries #12426

Open
msfroh opened this issue Feb 22, 2024 · 1 comment
Open

[Feature Request] More efficiently handle NOT exists(...) queries #12426

msfroh opened this issue Feb 22, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@msfroh
Copy link
Collaborator

msfroh commented Feb 22, 2024

Is your feature request related to a problem? Please describe

Recently, I've been involved in a couple of different query latency investigations where the culprit was a clause like:

"must_not": [
  {
    "exists": {
      "field" : "type"
    }
  }
]

It can come up in cases where people say "I want all docs where type is X, Y, Z, or null".

The problem is that the type field in these scenarios is very dense (i.e. almost every doc has a type). As a result, the must_not clause spends a lot of time stepping over docs that do have the field, trying to find the "holes".

By comparison, if the docs with missing type had a value like "typeMissing": "true", that clause could be rewritten as:

"filter": [
  {
    "term": {
      "typeMissing" : "true"
    }
  }
]

Since the "typeMissing": "true" term is very sparse (as the inverse of something very dense), that clause would be extremely cheap.

Describe the solution you'd like

While telling folks to explicitly index a "missing" value works, I'm wondering if there's something we can do to make it easier.

If folks don't index _source, then an _update_by_query to add the missing field isn't going to work, for example. Then they may be stuck resending all the docs with the missing field. Yuck...

Related component

Search:Performance

Describe alternatives you've considered

One thought I had was to add another meta field, like the _field_names field, but for the mapped fields that are not in a given document, maybe _missing_field_names. Then we could detect the negation of a field exists query and turn it into a query on that. It might be a bit messy on docs with nested fields (since the parent and child docs don't have the same mapping).

Another solution could be a way of "materializing" the missing field. The Lucene hacker in me would love to implement a FilterCodecReader that would create the missing term on the fly. Then a merge that wraps segments in that FilterCodecReader could output segments that have the missing term indexed.

Maybe there's another option? More explicit clause caching maybe? (To make sure that we cache the result of the not exists clause?)

Additional context

No response

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 22, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]
@msfroh Thanks for filing this issue, we'd gladly review a pull request.

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

2 participants