-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat emptyPredicates as always true, instead of pruning them #17
Conversation
f6a4783
to
f528b93
Compare
@@ -369,6 +369,24 @@ class GraphQL | |||
|
|||
expect(parts).to target_all_widget_indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the following returns all indices, which is different then how we treat anyOf when filtering a query. Is this okay, or should we make changes to make sure everything is more aligned?
parts = search_index_expression_parts_for({"any_of" => []})
f528b93
to
6512d2c
Compare
sub_filter = build_bool_hash do |inner_node| | ||
process_filter_hash(inner_node, expression, field_path) | ||
end | ||
|
||
return unless sub_filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this, because in theory we should never get a nil sub filter, as all emptyPredicates are treated as match_all
and code coverage doesn't like a hanging branch. This however causes problems with steep, because build_bool_hash
can return nil and we are trying to access [:bool]
... So to fix this, I set sub_filter type to untyped
. Pointing this out, incase anyone has any other paths I can take here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my suggestion about :not
and :empty
precedence will solve this in a more robust way.
5d3701c
to
c23dc6b
Compare
c23dc6b
to
99b37dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @acerbusace. This is really great to see! Left some suggestions and thoughts.
@@ -102,13 +197,20 @@ def filters_on_sub_fields?(expression) | |||
end | |||
end | |||
|
|||
def process_empty_or_nil_expression(bool_node, field_or_op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this process_empty_expression
? I consider both field: nil
and field: {}
to be empty expressions (after all, identify_node_type
returns :empty
for both cases).
@@ -57,11 +59,104 @@ def to_s | |||
|
|||
private | |||
|
|||
def reduce_query(bool_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an invariant I expect from this: I expect it to produce behaviorally equivalent queries to what would be produced if queries were not reduced. That is: for every possible query, we should get the same results from Elasticsearch whether or not the query has been reduced.
However, if you toggle this reduction (e.g. by changing the first line to return bool_node
) I found it causes some integration and acceptance specs to fail. (I expect it to break some unit specs since those assert on the query body itself, and I'm not concerned with those). Specifically, it breaks these:
rspec ./elasticgraph-graphql/spec/integration/elastic_graph/graphql/datastore_query/sub_aggregations_spec.rb:196 # ElasticGraph::GraphQL::DatastoreQuery sub-aggregations ignores empty filters
rspec ./elasticgraph-graphql/spec/acceptance/search_spec.rb:521 # ElasticGraph::GraphQL--search with a snake_case schema `list` filtering behavior supports filtering on scalar lists, nested object lists, and embedded object lists
rspec ./elasticgraph-graphql/spec/acceptance/search_spec.rb:521 # ElasticGraph::GraphQL--search with a camelCase schema, alternate derived type naming, and enum value overrides `list` filtering behavior supports filtering on scalar lists, nested
Let's start with the acceptance specs. Here they are:
results = query_teams_with(filter: {current_players_nested: {any_satisfy: {name: {equal_to_any_of: nil}}}})
expect(results).to eq [{"id" => "t1"}, {"id" => "t2"}, {"id" => "t3"}, {"id" => "t4"}]
Without the reduction, only t1
, t2
, and t3
are returned--so the act of reducing causes it to match t4
when it would not normally.
I dug into why and I think it points to yet another bug with how we handle empty predicates! In this case it's a bug under any_satisfy
. In this case, the filter is matching teams which have any players that satisfy name: {equal_to_any_of: nil}
. While the name: {equal_to_any_of: nil}
criteria translates into a match_all
, that's a match_all
against players, not teams. The outer criteria (the any_satisfy
) is against teams and can only be satisfied by teams that have at least one that player that satisfies the inner player criteria. A team which has no current players (as is the case with t4
) cannot satisfy the filter and should be omitted, I think.
That said, this might make for a little bit of a usability problem for ElasticGraph--if a query with has an optional filter on a nested field, and a client does not want to filter on that field and omits, should we then still filter on whether or not documents have any nested records? Something we should think more about.
The failing integration spec is related to this case as it also involves a nested field. in this case, it changes the query in such a way that there is extra meta
returned by the aggregation. I think it might be ok but we should think about it some more.
If we do want strict enforcement of the "query reduction should not impact query behavior" invariant, we may want to actually setup something in our test so that integration and acceptance specs run each query with and without the reduction and assert on getting the same results.
More generally, this code is very, very complicated and I don't yet understand it. Since it exists purely for optimization (not correctness) we need to be 100% sure it's correct; otherwise we should not do the reduction.
One option to consider: remove the reduction from this PR (so that this PR is just focused on the behavioral change we want to make) and then do a follow up PR that adds the reduction. Once nice property of that is it would make it more visible/obvious in the diff what the reduction is doing as it would show up in changes to the unit specs.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to the idea of having an enforcement on the "query reduction should not impact query behaviour". I guess first, we will need to figure out what to do with the any_satisfy
bug you caught (as my reduction, was mostly to match the current behaviour).
# This is an "empty" filter predicate and we can treat it as `true`. | ||
process_empty_or_nil_expression(bool_node, field_or_op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This is an "empty" filter predicate and we can treat it as `true`. | |
process_empty_or_nil_expression(bool_node, field_or_op) | |
process_empty_or_nil_expression(bool_node, field_or_op) |
No need for the comment. It was there before to explain why there was nothing in the when :empty
branch but now there's something.
@@ -102,13 +197,20 @@ def filters_on_sub_fields?(expression) | |||
end | |||
end | |||
|
|||
def process_empty_or_nil_expression(bool_node, field_or_op) | |||
if field_or_op == schema_names.not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a code smell that the handling for not
is spread out--it's handled primarily by process_not_expression
but also by this method. we want each different type of filtering node to be handled in one place and not have its responsibility spread out.
I understand why you did this, though: the current FilterNodeInterpreter
implementation does not allow the :empty
case to just handle the empty case and allow the :not
case to just handle the not case, because it returns :empty
for both field: nil
and not: nil
. The current implementation looks like this:
def identify_node_type(field_or_op, sub_expression)
return :empty if sub_expression.nil? || sub_expression == {}
return :not if field_or_op == schema_names.not
return :list_any_filter if field_or_op == schema_names.any_satisfy
return :all_of if field_or_op == schema_names.all_of
return :any_of if field_or_op == schema_names.any_of
return :operator if filter_operators.key?(field_or_op)
return :list_count if field_or_op == LIST_COUNTS_FIELD
return :sub_field if sub_expression.is_a?(::Hash)
:unknown
end
Notice that it detects :empty
based on sub_expression
before it has the chance to detect :not
based on field_or_op
. That robs process_not_expression
from the chance to negate the inner empty expression.
I'm thinking that we should always do the detection from the "outside in". That is, field_or_op
is always on the outside and we should first detect a node type based on that. Only once we've exhausted what we can know based on field_or_op
should we detect the type based on the sub_expression
. I think the implementation should instead be something like this:
def identify_node_type(field_or_op, sub_expression)
identify_by_field_or_op(field_or_op) || identify_by_sub_expr(sub_expression) || :unknown
end
private
def identify_by_field_or_op(field_or_op)
return :not if field_or_op == schema_names.not
return :list_any_filter if field_or_op == schema_names.any_satisfy
return :all_of if field_or_op == schema_names.all_of
return :any_of if field_or_op == schema_names.any_of
return :operator if filter_operators.key?(field_or_op)
return :list_count if field_or_op == LIST_COUNTS_FIELD
end
def identify_by_sub_expr(sub_expression)
return :empty if sub_expression.nil? || sub_expression == {}
return :sub_field if sub_expression.is_a?(::Hash)
end
Now, that change on its own probably breaks some stuff (the various process_*
methods haven't had to handle empty or nil
sub-expressions before because the :empty
branch took precedence) but it could be worthy prepatory refactoring in its own right to first fix identify_node_type
in its own PR, and then leverage it here so that this can just return match_all
and process_not_expression
can simply negate it.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. Originally, I thought of changing the order of when :not and :empty is returned, so this can be handled in the corresponding method. However, I think this is a better approach, where each method is handling this individually.
sub_filter = build_bool_hash do |inner_node| | ||
process_filter_hash(inner_node, expression, field_path) | ||
end | ||
|
||
return unless sub_filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my suggestion about :not
and :empty
precedence will solve this in a more robust way.
99b37dd
to
6b280b7
Compare
Context
Currently empty predicates are ignored. Within an AND context (the normal context), that works correctly. Within an OR context (under an anyOf) it does not work correctly: empty predicates should essentially be treated as
true
:predicate AND emptyPredicate
should be likepredicate AND true
which reduces to predicate (and thus is equivalent to ignoring the empty predicate).predicate OR emptyPredicate
should be likepredicate OR true
which just reduces totrue
Notably,
filter: A
andfilter: {anyOf: [A]}
should behave the same but right now they do not whenA
is an empty predicate –filter: emptyPredicate
returns all results whereasfilter: {anyOf: [emptyPredicate]}
returns no results.Change
This PR as mentioned above, looks to change
filter: {field: nil}
andfilter: {field: {}}
to evaluate to true. This in turns changes how anyOf behaves to be more like anOR
block, as shown below:filter: {anyOf: [emptyPredicate]}
will evaluate totrue
, instead offalse
filter: {anyOf: [emptyPredicate, {name: {equalToAnyOf: ["test"]}}]}
will evaluate totrue
, instead offalse
At the same time the behaviour of
(filter: {anyOf: []})
matching no documents is still preserved.Because of the way emptyPredicates are treated now, the following changes occured for how
not
behaves:filter: {not: {field: nil}}
will evaluate tofalse
, instead of being ignored