Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option to Optimize NOT Predicates with Push Down #14432

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ashishjayamohan
Copy link
Contributor

@ashishjayamohan ashishjayamohan commented Nov 12, 2024

  • Adds a PushDownNotFilterOptimizer class that pushes down NOT operator into group clauses
  • Applies DeMorgan's law to NOT predicates recursively to fully fragment predicate group
    • (Example: NOT(x AND y) is converted to NOT(x) OR NOT(y))
    • (Example: NOT(x OR y AND z) is first converted to NOT(x OR y) OR NOT(z) and then converted to (NOT(x) AND NOT(y)) OR NOT(z)
  • Adds option for optimizer to be run (default false)
  • My initial idea for general push down optimization came from here
  • Added several relevant test

@ashishjayamohan ashishjayamohan changed the title Push Down Not Optimizer Optimize NOT Predicates By Push Down Nov 13, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (59551e4) to head (3127fee).
Report is 1323 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (3127fee). Click for more details.

HEAD has 53 uploads less than BASE
Flag BASE (59551e4) HEAD (3127fee)
integration 7 1
integration2 3 1
temurin 12 1
java-21 7 1
skip-bytebuffers-true 3 0
skip-bytebuffers-false 7 1
unittests 5 0
unittests1 2 0
java-11 5 0
unittests2 3 0
integration1 2 0
custom-integration1 2 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #14432       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.76%     
=============================================
  Files          2436        3     -2433     
  Lines        133233        6   -133227     
  Branches      20636        0    -20636     
=============================================
- Hits          82274        0    -82274     
+ Misses        44911        6    -44905     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <ø> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <ø> (ø)
java-11 ?
java-21 0.00% <ø> (-61.63%) ⬇️
skip-bytebuffers-false 0.00% <ø> (-61.75%) ⬇️
skip-bytebuffers-true ?
temurin 0.00% <ø> (-61.76%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@itschrispeck
Copy link
Collaborator

Could we add this functionality behind a query option?

I had previously reverted the implicit logic here, since the change made many of our queries hundreds/thousands of times slower.

Example: NOT(x AND y) is converted to NOT(x) OR NOT(y)

In this case AND syntax often has the benefit of excluding many docs before y is evaluated.

@ashishjayamohan
Copy link
Contributor Author

That makes sense to me too. I think it would be beneficial to have it - even if it is an option that has to be manually set. I can go ahead and add that option.

@ashishjayamohan ashishjayamohan changed the title Optimize NOT Predicates By Push Down Add Option to Optimize NOT Predicates with Push Down Nov 13, 2024
@Jackie-Jiang
Copy link
Contributor

In most cases, NOT(x AND y) will be faster than NOT(x) OR NOT(y) because OR is a very expensive operation and quite hard to optimize.
IIRC, NOT(x OR y AND z) should be equivalent to NOT(x OR (y AND z)) where AND has higher precedence than OR.

I think rewriting NOT(x OR y) into NOT(x) AND NOT(y) can probably give better performance. Do you find a query where this rewrite can help improve the performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants