Treat emptyPredicates as always true, instead of pruning them #17

acerbusace · 2024-11-01T18:04:24Z

Context

Currently empty predicates are ignored. Within an AND context (the normal context), that works correctly. Within an OR context (under an anyOf) it does not work correctly: empty predicates should essentially be treated as true:

predicate AND emptyPredicate should be like predicate AND true which reduces to predicate (and thus is equivalent to ignoring the empty predicate).
predicate OR emptyPredicate should be like predicate OR true which just reduces to true

Notably, filter: A and filter: {anyOf: [A]} should behave the same but right now they do not when A is an empty predicate – filter: emptyPredicate returns all results whereas filter: {anyOf: [emptyPredicate]} returns no results.

Change

This PR as mentioned above, looks to change filter: {field: nil} and filter: {field: {}} to evaluate to true. This in turns changes how anyOf behaves to be more like an OR block, as shown below:

filter: {anyOf: [emptyPredicate]} will evaluate to true, instead of false
filter: {anyOf: [emptyPredicate, {name: {equalToAnyOf: ["test"]}}]} will evaluate to true, instead of false

At the same time the behaviour of (filter: {anyOf: []}) matching no documents is still preserved.

Because of the way emptyPredicates are treated now, the following changes occured for how not behaves:

filter: {not: {field: nil}} will evaluate to false, instead of being ignored

acerbusace · 2024-11-01T18:22:59Z

...raph-graphql/spec/unit/elastic_graph/graphql/datastore_query/search_index_expression_spec.rb

@@ -369,6 +369,24 @@ class GraphQL

              expect(parts).to target_all_widget_indices


Here, the following returns all indices, which is different then how we treat anyOf when filtering a query. Is this okay, or should we make changes to make sure everything is more aligned?

parts = search_index_expression_parts_for({"any_of" => []})

acerbusace · 2024-11-01T18:50:25Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

          sub_filter = build_bool_hash do |inner_node|
            process_filter_hash(inner_node, expression, field_path)
          end

-          return unless sub_filter


Removed this, because in theory we should never get a nil sub filter, as all emptyPredicates are treated as match_all and code coverage doesn't like a hanging branch. This however causes problems with steep, because build_bool_hash can return nil and we are trying to access [:bool]... So to fix this, I set sub_filter type to untyped. Pointing this out, incase anyone has any other paths I can take here.

I think my suggestion about :not and :empty precedence will solve this in a more robust way.

myronmarston

Thanks @acerbusace. This is really great to see! Left some suggestions and thoughts.

myronmarston · 2024-11-02T17:05:54Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

@@ -102,13 +197,20 @@ def filters_on_sub_fields?(expression)
          end
        end

+        def process_empty_or_nil_expression(bool_node, field_or_op)


Can we just call this process_empty_expression? I consider both field: nil and field: {} to be empty expressions (after all, identify_node_type returns :empty for both cases).

myronmarston · 2024-11-02T17:09:56Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

@@ -57,11 +59,104 @@ def to_s

        private

+        def reduce_query(bool_node)


There's an invariant I expect from this: I expect it to produce behaviorally equivalent queries to what would be produced if queries were not reduced. That is: for every possible query, we should get the same results from Elasticsearch whether or not the query has been reduced.

However, if you toggle this reduction (e.g. by changing the first line to return bool_node) I found it causes some integration and acceptance specs to fail. (I expect it to break some unit specs since those assert on the query body itself, and I'm not concerned with those). Specifically, it breaks these:

rspec ./elasticgraph-graphql/spec/integration/elastic_graph/graphql/datastore_query/sub_aggregations_spec.rb:196 # ElasticGraph::GraphQL::DatastoreQuery sub-aggregations ignores empty filters rspec ./elasticgraph-graphql/spec/acceptance/search_spec.rb:521 # ElasticGraph::GraphQL--search with a snake_case schema `list` filtering behavior supports filtering on scalar lists, nested object lists, and embedded object lists rspec ./elasticgraph-graphql/spec/acceptance/search_spec.rb:521 # ElasticGraph::GraphQL--search with a camelCase schema, alternate derived type naming, and enum value overrides `list` filtering behavior supports filtering on scalar lists, nested

Let's start with the acceptance specs. Here they are:

results = query_teams_with(filter: {current_players_nested: {any_satisfy: {name: {equal_to_any_of: nil}}}}) expect(results).to eq [{"id" => "t1"}, {"id" => "t2"}, {"id" => "t3"}, {"id" => "t4"}]

Without the reduction, only t1, t2, and t3 are returned--so the act of reducing causes it to match t4 when it would not normally.

I dug into why and I think it points to yet another bug with how we handle empty predicates! In this case it's a bug under any_satisfy. In this case, the filter is matching teams which have any players that satisfy name: {equal_to_any_of: nil}. While the name: {equal_to_any_of: nil} criteria translates into a match_all, that's a match_all against players, not teams. The outer criteria (the any_satisfy) is against teams and can only be satisfied by teams that have at least one that player that satisfies the inner player criteria. A team which has no current players (as is the case with t4) cannot satisfy the filter and should be omitted, I think.

That said, this might make for a little bit of a usability problem for ElasticGraph--if a query with has an optional filter on a nested field, and a client does not want to filter on that field and omits, should we then still filter on whether or not documents have any nested records? Something we should think more about.

The failing integration spec is related to this case as it also involves a nested field. in this case, it changes the query in such a way that there is extra meta returned by the aggregation. I think it might be ok but we should think about it some more.

If we do want strict enforcement of the "query reduction should not impact query behavior" invariant, we may want to actually setup something in our test so that integration and acceptance specs run each query with and without the reduction and assert on getting the same results.

More generally, this code is very, very complicated and I don't yet understand it. Since it exists purely for optimization (not correctness) we need to be 100% sure it's correct; otherwise we should not do the reduction.

One option to consider: remove the reduction from this PR (so that this PR is just focused on the behavioral change we want to make) and then do a follow up PR that adds the reduction. Once nice property of that is it would make it more visible/obvious in the diff what the reduction is doing as it would show up in changes to the unit specs.

Thoughts?

I'm open to the idea of having an enforcement on the "query reduction should not impact query behaviour". I guess first, we will need to figure out what to do with the any_satisfy bug you caught (as my reduction, was mostly to match the current behaviour).

myronmarston · 2024-11-02T17:14:07Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

+              # This is an "empty" filter predicate and we can treat it as `true`.
+              process_empty_or_nil_expression(bool_node, field_or_op)


Suggested change

# This is an "empty" filter predicate and we can treat it as `true`.

process_empty_or_nil_expression(bool_node, field_or_op)

process_empty_or_nil_expression(bool_node, field_or_op)

No need for the comment. It was there before to explain why there was nothing in the when :empty branch but now there's something.

myronmarston · 2024-11-04T02:41:59Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

@@ -102,13 +197,20 @@ def filters_on_sub_fields?(expression)
          end
        end

+        def process_empty_or_nil_expression(bool_node, field_or_op)
+          if field_or_op == schema_names.not


It's a code smell that the handling for not is spread out--it's handled primarily by process_not_expression but also by this method. we want each different type of filtering node to be handled in one place and not have its responsibility spread out.

I understand why you did this, though: the current FilterNodeInterpreter implementation does not allow the :empty case to just handle the empty case and allow the :not case to just handle the not case, because it returns :empty for both field: nil and not: nil. The current implementation looks like this:

def identify_node_type(field_or_op, sub_expression) return :empty if sub_expression.nil? || sub_expression == {} return :not if field_or_op == schema_names.not return :list_any_filter if field_or_op == schema_names.any_satisfy return :all_of if field_or_op == schema_names.all_of return :any_of if field_or_op == schema_names.any_of return :operator if filter_operators.key?(field_or_op) return :list_count if field_or_op == LIST_COUNTS_FIELD return :sub_field if sub_expression.is_a?(::Hash) :unknown end

Notice that it detects :empty based on sub_expression before it has the chance to detect :not based on field_or_op. That robs process_not_expression from the chance to negate the inner empty expression.

I'm thinking that we should always do the detection from the "outside in". That is, field_or_op is always on the outside and we should first detect a node type based on that. Only once we've exhausted what we can know based on field_or_op should we detect the type based on the sub_expression. I think the implementation should instead be something like this:

def identify_node_type(field_or_op, sub_expression) identify_by_field_or_op(field_or_op) || identify_by_sub_expr(sub_expression) || :unknown end private def identify_by_field_or_op(field_or_op) return :not if field_or_op == schema_names.not return :list_any_filter if field_or_op == schema_names.any_satisfy return :all_of if field_or_op == schema_names.all_of return :any_of if field_or_op == schema_names.any_of return :operator if filter_operators.key?(field_or_op) return :list_count if field_or_op == LIST_COUNTS_FIELD end def identify_by_sub_expr(sub_expression) return :empty if sub_expression.nil? || sub_expression == {} return :sub_field if sub_expression.is_a?(::Hash) end

Now, that change on its own probably breaks some stuff (the various process_* methods haven't had to handle empty or nil sub-expressions before because the :empty branch took precedence) but it could be worthy prepatory refactoring in its own right to first fix identify_node_type in its own PR, and then leverage it here so that this can just return match_all and process_not_expression can simply negate it.

Thoughts?

This makes sense. Originally, I thought of changing the order of when :not and :empty is returned, so this can be handled in the corresponding method. However, I think this is a better approach, where each method is handling this individually.

myronmarston · 2024-11-04T02:42:51Z

elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

          sub_filter = build_bool_hash do |inner_node|
            process_filter_hash(inner_node, expression, field_path)
          end

-          return unless sub_filter


I think my suggestion about :not and :empty precedence will solve this in a more robust way.

acerbusace requested review from myronmarston and BrianSigafoos-SQ as code owners November 1, 2024 18:04

acerbusace force-pushed the block/alexp/anyofchangesV4 branch 2 times, most recently from f6a4783 to f528b93 Compare November 1, 2024 18:19

acerbusace commented Nov 1, 2024

View reviewed changes

acerbusace force-pushed the block/alexp/anyofchangesV4 branch from f528b93 to 6512d2c Compare November 1, 2024 18:46

acerbusace commented Nov 1, 2024

View reviewed changes

acerbusace force-pushed the block/alexp/anyofchangesV4 branch 2 times, most recently from 5d3701c to c23dc6b Compare November 1, 2024 19:10

Always treat emptyPredicate as true

fc20da8

acerbusace force-pushed the block/alexp/anyofchangesV4 branch from c23dc6b to 99b37dd Compare November 1, 2024 19:30

acerbusace mentioned this pull request Nov 1, 2024

Change anyOf behaviour to evaluate to true for empty predicates #6

Merged

myronmarston requested changes Nov 4, 2024

View reviewed changes

Add more tests for anyOf

6b280b7

acerbusace force-pushed the block/alexp/anyofchangesV4 branch from 99b37dd to 6b280b7 Compare November 4, 2024 15:26

This was referenced Nov 4, 2024

Interpret field_or_op before sub_expression #21

Closed

Interpret field_or_op before sub_expression #22

Merged

myronmarston closed this Nov 12, 2024

myronmarston deleted the block/alexp/anyofchangesV4 branch November 12, 2024 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat emptyPredicates as always true, instead of pruning them #17

Treat emptyPredicates as always true, instead of pruning them #17

acerbusace commented Nov 1, 2024 •

edited

Loading

acerbusace Nov 1, 2024

acerbusace Nov 1, 2024

myronmarston Nov 4, 2024

myronmarston left a comment

myronmarston Nov 2, 2024

myronmarston Nov 2, 2024

acerbusace Nov 4, 2024

myronmarston Nov 2, 2024

myronmarston Nov 4, 2024

acerbusace Nov 4, 2024

myronmarston Nov 4, 2024

		@@ -369,6 +369,24 @@ class GraphQL

		expect(parts).to target_all_widget_indices

		@@ -57,11 +59,104 @@ def to_s

		private

		def reduce_query(bool_node)

		# This is an "empty" filter predicate and we can treat it as `true`.
		process_empty_or_nil_expression(bool_node, field_or_op)

Treat emptyPredicates as always true, instead of pruning them #17

Treat emptyPredicates as always true, instead of pruning them #17

Conversation

acerbusace commented Nov 1, 2024 • edited Loading

Context

Change

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myronmarston left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acerbusace commented Nov 1, 2024 •

edited

Loading