Skip to content

Persistence Strategy

Jim Amsden edited this page Feb 21, 2018 · 2 revisions

Maybe a good place to start is listing what we want to do with the OSLC/LDP persistent store:

  1. Be able to do CRUD operations on at least Linked Data resources
  2. Support (hopefully standard) query on Linked Data resources
  3. Support read and write of Turtle, JSON-LD, RDF/XML and optionally n3 content types
  4. Be able to deploy everything in Bluemix
  5. Have sufficient performance and scalability that at least simple integration apps can be quickly and easily created.
  6. Be sufficiently standard or common that swapping out a different persistence technology for production purposes would not be too difficult.
  7. Be useful for client and server side application components

Possibly the first question is what should the query interface be? If we consider the RDF vs JSON focus, then we're probably talking about SPARQL vs. MongoDB/sift.js. There may be other choices based on different NoSQL options, but the arguments are probably pretty similar. Some observations on each of the requirements above:

  1. Any database can do CRUD operations on resources. And I don't really care what the stored data format is as long as the query interface is usable and the content types are supported. Since OSLC is built on LDP which is built on RDF and HTTP, that's a strong argument for an RDF-friendly persistence strategy.

  2. Users tend to struggle with SPARQL, and IBM did its best to use Shapes to turn RDF into a typical visual structure query language for Jazz Reporting. However, I recall some pretty complex SQL queries and resistance to SQL adoption in the past. And is MongoDB query or sift.js really that much simpler than SPARQL, especially if the LDPRs are stored as n-triples in MongoDB (not JSON-LD)? I suspect not when used in production apps.

  3. As LDP.js demonstrates, it is possible to use MongoDB to store LDP resources in n3 triples and use n3.js to provide the required content types - except RDF/XML which is required for Jazz products and OSLC2 compatibility. So this is a gap in the current n3/MongoDB implementation.

  4. There is currently no RDF storage service on Bluemix. This seems like a significant functional gap. Apache TDB, 4store, 5store, Stardog, Virtuoso, or many others are possible candidates. In the meantime, Bluemix apps can be configured to access storage services offered from other Cloud service providers. Starting this way might generate the demand necessary to establish an RDF service.

  5. There are certainly production storage services for triple stores and other NoSQL databases. IBM has had some difficulties getting scalable performance for JRS using LQE. Its not clear what the contributing factors are, there are probably many. One possibility is the overhead of not using SPARQL directly and attempting to treat the triple store as a typical structured data storage model.

  6. RDF and SPARQL are currently the only standards for NoSQL database and query language, and are native to RDF, the LDP resource format.

  7. Clients typically get result sets from data queries which represent relatively unstructured name/value pairs that although easy to consume, are often a semantic mismatch with the applications models, views and/or controllers. But both SPARQL and MongoDB/sift.js support additional client-side (sub) queries. For SPARQL, a CONSTRUCT can be used to return a graph and then RDF APIs that support in-memory basic graph queries can be used to do additional rich queries on the client side. Similarly sift.js can be used to query tree-structured JSON.

These observations lead me to conclude that OSLC/LDP would benefit greatly from using an RDF triple store for persistence that supported the required content types and standard SPARQL queries. This may not be true for other integration technologies, but it seems to be clearly the case for OSLC (for better or worse).

rdflib.js probably provides the closest RDF API to meet all the requirements. It supports all the content types (except the ability to write JSON-LD, but that's easily added), and has the in-memory basic graph query capability that makes getting information from RDF resources easier. This Node module is still being actively developed and is being lead by TBL himself. Plus clients can still get JSON-LD and use sift.js to query that if they want. Its much harder to add SPARQL to a non-RDF or pseudo-RDF store to get both options. Its a bit bulky for browser-based clients. That could be a problem.

Clone this wiki locally