Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-1615: LATERAL join #1631

Merged
merged 8 commits into from
Dec 1, 2022
Merged

GH-1615: LATERAL join #1631

merged 8 commits into from
Dec 1, 2022

Conversation

afs
Copy link
Member

@afs afs commented Nov 20, 2022

GitHub issue resolved #1615

Pull request Description:
Implementation of LATERAL join.

Currently, there are different commits for the different implementation steps but each step does not necessarily work on its own. That helps inspect the PR.

The PR will be cleaned up before merge.


  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.

@afs afs force-pushed the lateral branch 2 times, most recently from c8c0196 to e80da23 Compare November 21, 2022 20:09
@afs afs marked this pull request as ready for review November 21, 2022 20:10
@afs
Copy link
Member Author

afs commented Nov 22, 2022

Rebased to current main branch.

if (testSpecialCaseUnused(subOp, joins, remaining))
return OpTable.empty();
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an alternative fix be overriding the transform of OpLateral in this class to return a direct copy of the original opLateral ignoring any transforms this might have generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility - I think it seems to designing for handling a bad query case (if not LATERAL), rather than optimization in LATERAL.

It does seem to be a risk more widely, now and in future changes and not just in LATERAL - e.g. optimizing then substituting (e.g. parameterized queries). (NOT)EXISTS as well which is related to substitution. The fix for JENA-500 is a specific fix for the whole-query case.

The transform walk is bottom up so it will run over the code - the walker needs to be changed to avoid certain ops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the bottom up property means the optimisation would be attempted but then you could discard it when you reached the OpLateral level but that does mean unnecessary work is done.

Having the walker be able to decide what operations to descend through could be a useful mechanism in general for limiting the scope of transformations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the walker is big change - I've tried it before. I'd like to get this PR in without resorting to that.

I believe the special case is also a potential impact in other places. We haven't seen it reported because e.g. EXISTS {} is often simple.

Is there any pointers as to when it has been beneficial?

I tried skipping it rather than return null but TestSemanticEquivalence.implicitJoinEvaluation3 and on a SPARQL WG test fail.

In implicitJoinEvaluation3, it only fails on the query string part - the next two algebra level tests work. This doesn't make sense to me -- the algebra steps are supposed to cover the possibilities of the SPARQL query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any pointers as to when it has been beneficial?

If memory serves this particular piece originated because of users. This originated when I was at Cray and occasionally we'd see users make simple typos in their queries, especially around variable names, that would cause the engine to do a lot of work it later just threw away due to the bad filter clause so this was basically a case of protecting users from themselves

So we'd get a bug of the form "Why does my query take ages and return nothing?" where the returning nothing was due to user error. By spotting these cases in the optimiser we could return nothing much faster, and generally the users figured out the typo themselves in that case because there weren't distracted by the slow performance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query timeouts will catch that (we hope).

EXISTS uses the same injection of bindings as JENA-500 except it does it dynamically at runtime (like LATERAL - EXIST is the ASK version of LATERAL). Whether that's the current iffy form or the better one written up.

It's is less likely to occur in EXISTS due to usage but it is possible. And like LATERAL, complicated because it may be bound sometimes and not others.

Can we merge as-is which is functionally safe?


I'm still mystified by TestSemanticEquivalence.implicitJoinEvaluation3.

Maybe the test is wrong and comparing to zero happens for the wrong reasons but I can't see a difference to the algebra sub-tests.

@afs afs merged commit 800e865 into apache:main Dec 1, 2022
@afs afs deleted the lateral branch December 1, 2022 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LATERAL join
2 participants