Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outline Query Hints #33

Open
dwhitney opened this issue Sep 11, 2019 · 3 comments
Open

Outline Query Hints #33

dwhitney opened this issue Sep 11, 2019 · 3 comments
Labels
question Further information is requested

Comments

@dwhitney
Copy link
Contributor

dwhitney commented Sep 11, 2019

Hi, I'm doing some performance optimization over the next couple of weeks in our app, and I'm wondering if you could outline some of the existing query hints - how they work and what they do, etc.

Also I see a query hint called SORTED_TRIPLES, but I don't think there is an implementation for it. In my custom graph my triples are indexed and sorted, and I think I could see some pretty good benefits from this particular query hint. Could you describe how it's supposed to work and if I get a chance I will attempt to implement it and make a PR?

Thanks!

@Callidon
Copy link
Owner

Hello

Query Hints are designed to work similarly to those implemented by Blazegraph.
The idea is that: you write down those very specific RDF triples into your SPARQL query and they provide hints to the query optimizer about how to do its job. Of course, these "query hints triples" are not processed by the query engine, they are just a convenient way of embedding query execution logic into the query. I really like this idea because, in my opinion, it's pretty elegant and very portable.

For example, in sparql-engine, the following query forces the optimizer to use symmetric hash joins operator to resolve all joins in the Basic Graph Pattern.

PREFIX dblp-pers: <https://dblp.org/pers/m/>
    PREFIX dblp-rdf: <https://dblp.uni-trier.de/rdf/schema-2017-04-18#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX hint: <http://callidon.github.io/sparql-engine/hints#>
    SELECT ?name ?article WHERE {
      hint:Group hint:SymmetricHashJoin true.
      ?s rdf:type dblp-rdf:Person .
      ?s dblp-rdf:primaryFullPersonName ?name .
      ?s dblp-rdf:authorOf ?article .
    }

Here, the hint is the triple hint:Group hint:SymmetricHashJoin true. It does not require any more configuration than a classic query: you just put the hint into the query and execute it!

Initially, I've planned to add various hints to the engine, including one that leverage sorted indexes for query processing (SORTED_TRIPLES). However, due to schedule constraints, I've only implemented the hint that enables symmetric hash joins.

If you want to implement new query hints, feel free to! However, about the hint SORTED_TRIPLES, I've already the code that implement it, I'm just too busy to test it properly. If it's good for you, I should be able to put that in production at the end of September.

If you have any more questions, feel free to ask!

@Callidon Callidon added the question Further information is requested label Sep 12, 2019
@dwhitney
Copy link
Contributor Author

dwhitney commented Sep 12, 2019 via email

@dwhitney
Copy link
Contributor Author

dwhitney commented Sep 24, 2019

Having some thoughts on this...

I'm not sure if you are familiar with Datomic, it's a database created by Rich Hickey, who created Clojure, and, among other things, it uses datalog as a query language. One of the things you must do when defining a schema is specify a cardinality on each attribute that you define. In the example I linked to, a person has one name. I would imagine this is required because it would dramatically speed up a join pipeline if you knew there was only one value to find as opposed to multiple. I'd imagine adding a query hint for cardinality in sparql-engine would likely have the same effect. Perhaps that would be worth adding? Do you think it would be difficult? I could take a crack at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants