-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-1615: LATERAL join #1631
GH-1615: LATERAL join #1631
Conversation
c8c0196
to
e80da23
Compare
Rebased to current main branch. |
if (testSpecialCaseUnused(subOp, joins, remaining)) | ||
return OpTable.empty(); | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would an alternative fix be overriding the transform of OpLateral
in this class to return a direct copy of the original opLateral
ignoring any transforms this might have generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a possibility - I think it seems to designing for handling a bad query case (if not LATERAL), rather than optimization in LATERAL.
It does seem to be a risk more widely, now and in future changes and not just in LATERAL - e.g. optimizing then substituting (e.g. parameterized queries). (NOT)EXISTS as well which is related to substitution. The fix for JENA-500 is a specific fix for the whole-query case.
The transform walk is bottom up so it will run over the code - the walker needs to be changed to avoid certain ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the bottom up property means the optimisation would be attempted but then you could discard it when you reached the OpLateral
level but that does mean unnecessary work is done.
Having the walker be able to decide what operations to descend through could be a useful mechanism in general for limiting the scope of transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the walker is big change - I've tried it before. I'd like to get this PR in without resorting to that.
I believe the special case is also a potential impact in other places. We haven't seen it reported because e.g. EXISTS {} is often simple.
Is there any pointers as to when it has been beneficial?
I tried skipping it rather than return null
but TestSemanticEquivalence.implicitJoinEvaluation3
and on a SPARQL WG test fail.
In implicitJoinEvaluation3
, it only fails on the query string part - the next two algebra level tests work. This doesn't make sense to me -- the algebra steps are supposed to cover the possibilities of the SPARQL query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any pointers as to when it has been beneficial?
If memory serves this particular piece originated because of users. This originated when I was at Cray and occasionally we'd see users make simple typos in their queries, especially around variable names, that would cause the engine to do a lot of work it later just threw away due to the bad filter clause so this was basically a case of protecting users from themselves
So we'd get a bug of the form "Why does my query take ages and return nothing?" where the returning nothing was due to user error. By spotting these cases in the optimiser we could return nothing much faster, and generally the users figured out the typo themselves in that case because there weren't distracted by the slow performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query timeouts will catch that (we hope).
EXISTS uses the same injection of bindings as JENA-500 except it does it dynamically at runtime (like LATERAL - EXIST is the ASK version of LATERAL). Whether that's the current iffy form or the better one written up.
It's is less likely to occur in EXISTS due to usage but it is possible. And like LATERAL, complicated because it may be bound sometimes and not others.
Can we merge as-is which is functionally safe?
I'm still mystified by TestSemanticEquivalence.implicitJoinEvaluation3
.
Maybe the test is wrong and comparing to zero happens for the wrong reasons but I can't see a difference to the algebra sub-tests.
GitHub issue resolved #1615
Pull request Description:
Implementation of LATERAL join.
Currently, there are different commits for the different implementation steps but each step does not necessarily work on its own. That helps inspect the PR.
The PR will be cleaned up before merge.
By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.