-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LATERAL join #1615
Comments
Interesting! I learned about lateral joins in a project some time ago, from a developer that was familiar with that concept in Postgres. He explained the same way, that it's similar to a for-loop: https://medium.com/kkempin/postgresqls-lateral-join-bfd6bd0199df Looking forward to trying it out in Jena. |
Thank you for moving forward with this idea! It would be amazing if we could come up with a compatible feature in Jena and Oxigraph. It might be a good baseline if the SPARQL 1.2 project somehow restarts. Thank you for the restriction on introduced variables. I have not thought about it. For sub-selects, I like the idea of bindings only the variables in the A question: I am a bit not at ease with the A side question: how to you plan to represents LATERALs in SPARQL S-Expressions? I am trying to keep Oxigraph as compatible as possible with Jena and Ruby-RDF. |
This is a feature many people are missing in SPARQL for a long time, so happy to see it in a long term in the standard and in near time in Jena - we're already making use of the enhanced Minor: In Scope section
Do you really mean RHS or is it LHS? @afs |
Fixed. |
Description updated with an outline of integration into section 18 : Definition of SPARQL and section 19 : SPARQL Grammar. |
@Tpt - thanks for the comments. In the
The current design is to have a operator in the algebra While it is related to
An existing example that is not so different is At the moment, I'm looking to what are the fundamental operations. Is there one algebra operator needed? For syntax, it may be worth while having syntax forms for that translate to this/these funamental operations. May be LATERAL should be
In PostgreSQL, In that style, there could be It is not a strict functional from (but again, this is not the first in SPARQL - as well as From the point of view of fundamental operations, Do you have some examples we can explore? If this is the case, and these are common, these can be syntax forms that translate to the same Digression: It would be nice to have "SELECT-less" (sub)queries: |
For interoperability, it would be good if there was agreement for whether variable lists are unsupported, optional or mandatory. To me it seems an 'optional' variable list would be most convenient to use. Also, proper lateral join support would solve an issue with the service enhancer which @LorenzBuehmann found out:
So proper lateral is very welcome! |
Thank you! It definitely make sense.
That's a good point! Indeed, we have to already introduced the substitution operation for FILTER NOT EXISTS so having it inside of the "lateral" operator definition make sense. This gives a good definition of
Yes, to avoid the |
Unrelated. Whether perfect reversal is possible isn't clear - some kind of reversal isn't looking too hard but there's an interaction with LeftJoin conditions (expressions) so query equality round-trip is unlikely to be a simple matter to provide. You'd maybe better working with unoptimized algebra. Ultimately: optimized => less likely OpAsQuery can reverse an algebra expression. |
So if I understand correctly,
Applying this approach on different logical join types definitely makes sense to me. So considering |
Does it mean that "SPARQL is evaluated bottom-up" would not be true anymore? |
Yes, lateral adds the feature "For each binding on the left hand side substitute on the right-hand-side and only then evaluate it". So it adds "left-to-right" evaluation. |
Have to be careful here. It's the operator that determines the evaluation, there isn't some policy for the whole expression. Just most current algebra operators are depth first evaluation (AKA functions) and we all say "evaluated bottom-up".
The proposal is that the row is There is a discussion point about whether "eval B with row from A" should or should not use the in-scope rules for variables:
Does the From a SPARQL POV, that sub-query can otherwise be
Just for At the moment, I'm more inclined to the scoping version so that there isn't a eval special case of "inside LATERAL" and making developing big queries piece-by-piece more predicable (arguably), but it does cause a "surprise" case. Another reason is that special cases tend to have complicated consequences. When/if we have query template and parameterization, unconditionally replacing |
I'd say "no scope rules" (i.e. unconditional substitution) makes more sense when there is just LATERAL because otherwise aggregations would be complex to write. For example, I think that the query below should give for each class the capped count of its instances: #Q1:
SELECT * {
?type a owl:Class
LATERAL { SELECT (COUNT(*) AS ?cappedInstanceCount) { SELECT * { ?i a ?type } LIMIT 10000 } }
}
# Optional SELECT would be nice here too:
# LATERAL { SELECT (COUNT(*) AS ?cappedInstanceCount) { { ?i a ?type } LIMIT 10000 } } What good would do LATERAL if that wasn't the case? Otherwise, if scoping rules were applied then one would have to weave in the lateral-joining variables into the aggregations - which imho is needlessly cumbersome: #Q2:
SELECT * {
?type a owl:Class
LATERAL {
SELECT ?type (COUNT(*) AS ?cappedInstanceCount) { SELECT * { ?i a ?type } LIMIT 10000 } GROUP BY ?type
}
} |
Those two queries don't do the same thing :-) The only addition is the The variable isn't called What happens if For users, copying in a standalone query fragment isn't necessary going to work for them because LATERAL can change the results. Lack of predictability limits other extensions. e.g. Named tables - results are calculated outside the LATERAL, so results are with-scoping, but put that query text inside the LATERAL (user intuition - named tables are a way to avoid repeated copies o query patterns ) and you can get different results. There isn't a perfect answer here. Making LATERAL change the deep scoping rules, and is not just the evaluation of new operator will get complicated, will affect transformations and optimizations and will have surprises. |
Well yes, the good thing with the way the Jena implementation handles variable scopes with This way, one can add variable scope alignment as an extra step in the lateral definition - something along the lines of
But translating this into something compatible with the sparql spec which only has a notion of in-scope seems painful :/
From a substitution perspective I'd say it is simply not substituted - so in the example below everything would be fetched if
Of course one could also argue that the rhs is simply not evaluated for such a binding in that case - but to me that seems more like a special rule then. |
Not quite. The LATERAL may itself be sub-query nested so it's |
Yes, but the variable scope alignment always aligns the rhs with what is present on the lhs based on the original name. |
See #1628 |
GH-1615: Place LATERAL syntax tests in their own directory
LATERAL join
Proposed experimental feature.
Work-in-progress: the issue description is being edited in-place.
A
LATERAL
join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before theLATERAL
keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the RHS in-scope during each RHS evaluation.A regular join only executes the RHS once, and the variables from the LHS are only used for the join condition after evaluation of the left and right sub-patterns.
Another way to think of a lateral join is as a
flatmap
.Examples:
{ OPTIONAL ...
is the same as writing{ {} OPTIONAL ...
.{ }
evaluates to the join identity, a table of one row of zero columns.Syntax
The
LATERAL
keyword which has the graph pattern so far (from the{
starting the current block) and a{ }
block afterwards.Possible addition:
LATERAL ( ?var1 ?var2 ...)
to specify certain variables to expose to the RHS. Other variables would be (inner)joined as usual. This may be an unnecessary feature.Scope
A sub-select may have variables of the same name that are not lateral-joined to a variable of the same name from the LHS.
The inner
?s
in theSELECT ?label
is not the outer?s
because theSELECT ?label
does not pass out?s
. As a sub-query the?s
could be any name except?label
for the same results.This is the same situation as a sub-query in other situations.
There needs to be a new syntax restriction: there can no variable introduced by
AS
(BIND
, or sub-query) orVALUES
in-scope at the top level of theLATERAL
RHS, that is the same name as any in-scope variable from the LHS.See SPARQL Grammar note 12.
In ARQ, LET would work.
LET
for a variable that is bound acts like a filter.Evaluation
Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of
LATERAL
Notes
There is a similarity to filter
NOT EXISTS
/EXISTS
expressed as the not-legalFILTER ( ASK { pattern } )
where the variables of the row being filtered are available to "pattern". This is similar to ab SQL correlated subquery.Elsewhere
JOIN LATERAL or Correlated Subquery w3c/sparql-dev#100
Jena's SERVICE loop:
Oxigraph: oxigraph/issues/267, oxigraph/pull/274
https://docs.stardog.com/query-stardog/stored-query-service#correlated-subqueries
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL
https://dev.mysql.com/doc/refman/8.0/en/lateral-derived-tables.html
https://en.wikipedia.org/wiki/Correlated_subquery
Spec updates
Syntax
LATERAL
is added to the SPARQL grammar at rule[[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples)
. As a syntax form, it is similar toOPTIONAL
.Algebra
The new algebra is operator is
lateral
which takes two expressionsis translated to:
Evaluation
To evaluate
lateral
:inject variable bindings into the second argument
Evaluate this pattern
Add to results
Outline:
where
inject
is the correctedsubstitute
operation.An alternative style is to define Lateral more like "evaluate P such that μ is in-scope" in some
way, rather than rely on
inject
which is a mechanism.The text was updated successfully, but these errors were encountered: