Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from SPARQL.js (and potentially N3) #22

Open
dwhitney opened this issue May 1, 2019 · 3 comments
Open

Move away from SPARQL.js (and potentially N3) #22

dwhitney opened this issue May 1, 2019 · 3 comments
Labels
discussion Discussing about the future (productivity not required) question Further information is requested

Comments

@dwhitney
Copy link
Contributor

dwhitney commented May 1, 2019

Is your feature request related to a problem? Please describe.

SPARQL.js is a major performance bottleneck! I've found that in the majority of my queries, parsing the query takes the most time. For example this query takes just over 100ms to parse, and about 6ms to process the data. I realize it's not a trivial query, but 100ms!

What I like about this library is that I can cache small data sets locally in the browser and query large data sets remotely via SERVICE clauses (very cool!), but if it takes 100ms to parse the query, I may as well just do everything remotely, because caching isn't saving me any time.

Describe the solution you'd like

I did some research on JS parsers and found a library under pretty active development called Chevrotain that pretty much blows jison out of the water from a performance perspective. (jison is what SPARQL.js uses). Then I found another library that already has a SPARQL and Turtle parser called millan. Just some quick bench marking on the query I linked to above showed a 100% performance improvement on the first parse and then 500% performance improvement upon subsequent parses.

Milan has a bunch of "junk" in their library that I don't think is necessary for sparql-engine, so I think I'd maybe take that library as inspiration rather than simply using it directly. Also Milan is running an older version of Chevrotain, and I can see a bunch of places that performance improvements could be made to their current parser.

What do you think about this idea:

  • abstract out the parser and make the PlanBuilder take a parser param of type string => Algebra.RootNode.
  • create another project called sparql-engine-sparqljs-parser that provides an implementation with the current functionality
  • create another project called sparql-engine-chevrotain-parser that provides an implementation with a Chevrotain parser.

This would let people choose which they liked better -- maybe they are already using one or the other.

Also down the road N3 could probably be replaced with a Chevrotain parser as well. I have had issues with N3 in the past, and I find the source code very difficult to read, but it's fine for now.

Let me know your thoughts and I am happy to do this when I get to it.

@mielvds
Copy link

mielvds commented May 21, 2019

If jison is the problem, why don't you suggest SPARQL.js to replace it by Chevrotain?

@dwhitney
Copy link
Contributor Author

dwhitney commented May 21, 2019 via email

@RubenVerborgh
Copy link

May I suggest to also report issues in the original repositories? This can help us fix the problem for more people, and I also can give you some background on the decisions made back then and what could help to improve things in other libraries.

SPARQL.js is a major performance bottleneck! I've found that in the majority of my queries, parsing the query takes the most time. For example this query takes just over 100ms to parse, and about 6ms to process the data. I realize it's not a trivial query, but 100ms!

Yes, the underlying jison library was the best at the time, but it was never really good. An even worse case is with syntactically incorrect queries, which can get you very extreme times.

Back when I developed SPARQL.js, there was no single fully compliant SPARQL library for JavaScript. People were using the rdfstore.js parser, which was incomplete and buggy.

So my main goals with SPARQL.js were to:

  1. set up a testing framework to assess compliance with the SPARQL 1.1 spec
  2. have a fast path to a compatible parser

Because of 2), I resorted to a parser generator framework (jison) rather than hand-rolling my own as I did with N3. It was the workflow that could deliver the parser the fastest way, not necessarily the most performant parser.

Now jison had many issues, some of which I fixed upstream, but the author wasn't very responsive with accepting pull requests, so I new that this was not a sustainable route. But at least we had a spec-compatible parser now, with acceptable performance for the kinds of queries we needed.

I haven't checked performance of your query; maybe it is one specific fixable thing causing trouble, maybe it's more.

I did some research on JS parsers and found a library under pretty active development called Chevrotain that pretty much blows jison out of the water from a performance perspective.

One would hope indeed!

From my perspective, I would invite you to reuse the testing framework of SPARQL.js. That was arguably what took most time, and then you'll immediately know how compatible your parser is. If you want, I'm also happy to have you as a contributor on SPARQL.js itself; there's no need for keeping an old library around if we have an equally compatible but faster one. Please let me know.

Milan has a bunch of "junk" in their library that I don't think is necessary for sparql-engine,

Proper imports and a sufficiently smart packer should get you rid of that.

Also down the road N3 could probably be replaced with a Chevrotain parser as well.

Will it be as fast, and will it be streaming?
N3 is an extremely optimized hand-rolled parser, with a performance that will be hard to beat by parser generators.

I have had issues with N3 in the past

Then please report them 🙂 I'm known for my fast response times.


All of the above aside, it's great that there are different options for dealing with RDF and SPARQL in JavaScript. However, we might want to recall that we're a very small community. Shall we try to work together whenever possible and meaningful? I don't think we need many different SPARQL parsers, that will only break up the community and lower general adoption. Can we think collaboration first, and if that does not work out, working in parallel to a common interoperable spec and format (RDF/JS and rdf-ext)? This will help us all.

@Callidon Callidon added question Further information is requested discussion Discussing about the future (productivity not required) labels Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing about the future (productivity not required) question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants