-
Notifications
You must be signed in to change notification settings - Fork 196
Batch Mode for JSON LD Generation
Karma can be used in a batch mode to generate JSON-LD for large datasets. This can be done using a command line Utility OfflineRDFGenerator or using the Karma JSON-LD Generation API
This is a command line utility to load a model and a source, and then generate RDF and JSON-LD. The source can be JSON, XML, CSV or database. With database, the API loads 10,000 rows at a time. Karma home setting KARMA_USER_HOME
should be set appropriately: see Configuration.
To build the offline jar, goto the karma-offline subdirectory and execute the following:
cd karma-offline
mvn install -P shaded
This builds a standalone jar karma-offline-0.0.1-SNAPSHOT-shaded.jar
in the target
sub-folder or karma-offline that can be used to generate RDF and JSON-LD in batch mode
To generate JSON-LD when the source is a file, go the the karma-offline/target
sub-directory of Karma and execute the following command:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype <sourcetype> \
--filepath <filepath> \
--modelfilepath <modelfilepath> \
--sourcename <sourcename> \
--outputfile <rdf-outputfile> \
--jsonoutputfile <json-outputfile> \
[--contextfile <contextfile> | --contexturl <contextUrl>] \
[--selection <selectionName] \
[--root <rootClassIDForJsonLD>] \
[--killtriplemap <triplemapid to stop from expansion> ] \
[--stoptriplemap <stop the rdf generation from this triplemapid onwards> ]
Example invocation for a JSON file:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype JSON \
--filepath "/files/data/wikipedia.json" \
--modelfilepath "/files/models/model-wikipedia.ttl" \
--sourcename wikipedia \
--outputfile wikipedia-rdf.n3 \
--contextfile wiki-context.json \
--root "http://schema.org/Document1" \
--jsonoutputfile wikipedia.json
For a CSV file, you can specify additional parameters, such as the delimiter, text qualifier, header start index and the data start index. Example invocation for a JSON file with tab as delimiter and quotes as qualifier:
Example invocation for a CSV file:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype CSV \
--filepath "/files/data/wikipedia.csv" \
--delimiter TAB \
--textqualifier '\\\"' \
--headerindex 1 \
--dataindex 2 \
--modelfilepath "/files/models/model-wikipedia.ttl" \
--sourcename wikipedia \
--outputfile wikipedia-rdf.n3 \
--contextfile wiki-context.json \
--root "http://schema.org/Document1" \
--jsonoutputfile wikipedia.json
To generate JSON-LD of a database table, go to the karma-offline subdirectory of Karma and run the following command from terminal:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB \
--modelfilepath <modelfilepath> \
--outputfile <outputfile> \
--jsonoutputfile <json-outputfile> \
[--contextfile <contextfile> | --contexturl <contextUrl>] \
[--selection <selectionName] \
[--root <rootClassIDForJsonLD>] \
[--killtriplemap <triplemapid to stop from expansion> ] \
[--stoptriplemap <stop the rdf generation from this triplemapid onwards> ] \
--dbtype <dbtype> \
--hostname <hostname> \
--username <username> \
--password <password> \
--portnumber <portnumber> \
--dbname <dbname> \
--tablename <tablename>
Valid argument values for dbtype
are Oracle, MySQL, SQLServer, PostGIS, Sybase
Example invocation:
java -cp mysql-connector-java-5.0.8-bin.jar:karma-offline-0.0.1-SNAPSHOT-shaded.jar \
edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB \
--dbtype MySQL \
--hostname localhost \
--username root \
--password mypassword \
--portnumber 3306 \
--dbname karma \
--tablename offlineUsers \
--modelfilepath "/Users/dipsy/karma-projects/offlineUsers-model.ttl" \
--outputfile offlineUsers-rdf.n3 \
--contentfile person-context.json \
--jsonoutputfile offlineUsers-jdonld.json \
--root "http://schema.org/Person1"
If the model requires a selection, the selection name 'DEFAULT_TEST 'needs to be passed as a command line argument --selection
to the OfflineRDFGenerator. This makes it possible to execute the same model with or without selection in offline mode.
Example invocation:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.OfflineRdfGenerator \
--sourcetype DB --dbtype SQLServer \
--hostname example.com --username root --password secret \
--portnumber 1433 --dbname Employees --tablename Person \
--modelfilepath "/files/models/db-r2rml-model.ttl" \
--outputfile db-rdf.n3 \
--contextfile db-context.json \
--root "http://schema.org/Person1" \
--sourcename wikipedia \
--selection "DEFAULT_TEST" \
--jsonoutputfile db.json
To generate the context from the model using command line, you can use the following utility:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.GenerateContextFromModel \
--modelpath <path-to-model-file>
--outputfile <optional, output-file-name>
Example:
java -cp karma-offline-0.0.1-SNAPSHOT-shaded.jar edu.isi.karma.rdf.GenerateContextFromModel \
--modelpath language-model-1.txt \
--outputfile language-context.json
{"@context": {
"a": "@type",
"prefLabel": {"@id": "http://www.w3.org/2008/05/skos#prefLabel"},
"Concept": {
"@type": "@id",
"@id": "http://www.w3.org/2008/05/skos#Concept"
},
"url": "@id"
}}
This API is meant for repeated RDF/JSON-LD generation from the same model. In this setting we load the models at the beginning and then every time the user does a query we use the model to generate RDF. The input can be JSON, CSV or an XML File / String / InputStream.
edu.isi.karma.rdf.GenericRDFGenerator
API to add a model to the RDF Generator
// modelIdentifier : Provides a name and location of the model file
void addModel(R2RMLMappingIdentifier modelIdentifier);
API to generate the JSON-LD For a Request
//request : Provides all details for the Inputs to the RDF Generator like the input data, setting for provenance etc
void generateRDF(RDFGeneratorRequest request)
edu.isi.karma.rdf.RDFGeneratorRequest
API to set the input data
//inputData : Input Data as String
public void setInputData(String inputData)
//inputStream: Input data as a Stream
public void setInputStream(InputStream inputStream)
//inputFile: Input data file
public void setInputFile(File inputFile)
API to set the input data type
//dataType: Valid values: CSV,JSON,XML,AVRO
public void setDataType(InputType dataType)
Setting to generate provenance information
//addProvenance -> flag to indicate if provenance information should be added to the RDF
public void setAddProvenance(boolean addProvenance)
The writer for RDF
//writer -> Writer for the output. For JSON-LD generation, this should be JSONKR2RMLRDFWriter
public void addWriter(KR2RMLRDFWriter writer)
Example use:
GenericRDFGenerator rdfGenerator = new GenericRDFGenerator();
//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);
String filename = "files/data/people.json";
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
JSONKR2RMLRDFWriter writer = new JSONKR2RMLRDFWriter(pw);
RDFGeneratorRequest request = new RDFGeneratorRequest("people-model", filename);
request.setInputFile(new File(getTestResource(filename).toURI()));
request.setAddProvenance(true);
request.setDataType(InputType.JSON);
request.addWriter(writer);
rdfGenerator.generateRDF(request);
String jsonld = sw.toString();
System.out.println("Generated JSON-LD: " + jsonld);
If the model requires a selection, GenericRDFGenerator provides a contructor that takes in the selection name 'DEFAULT_TEST 'as the argument.
Example use:
GenericRDFGenerator rdfGenerator = new GenericRDFGenerator('DEFAULT_TEST');
//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);
String filename = "files/data/people.json";
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
JSONKR2RMLRDFWriter writer = new JSONKR2RMLRDFWriter(pw);
RDFGeneratorRequest request = new RDFGeneratorRequest("people-model", filename);
request.setInputFile(new File(getTestResource(filename).toURI()));
request.setAddProvenance(true);
request.setDataType(InputType.JSON);
request.addWriter(writer);
rdfGenerator.generateRDF(request);
String jsonld = sw.toString();
System.out.println("Generated JSON-LD: " + jsonld);