Skip to content

Batch Mode for Tomcat

dkapoor edited this page Aug 19, 2014 · 4 revisions

Karma provides 2 ways to generate RDFs in Batch mode. The Batch mode is meant for bulk use and can handle very large datasets. Copy the karma-offline.jar to a folder

OfflineRDFGenerator

This is a command line utility to load a model and a source, and then generate RDF. The source can be JSON, XML, CSV or database. With database, the API loads 10,000 rows at a time.

To generate RDF when the source is a file, go the the folder containing karma-offline.jsr and execute the following command:

java -jar karma-offline.jar --sourcetype 
<sourcetype> --filepath <filepath> --modelfilepath <modelfilepath> --sourcename <sourcename> --outputfile <outputfile>

Example invocation for a JSON file:

java -jar karma-offline.jar --sourcetype JSON --filepath "/files/data/wikipedia.json" --modelfilepath "/files/models/model-wikipedia.n3" --sourcename wikipedia --outputfile wikipedia-rdf.n3

To generate RDF of a database table, you need the JDBC driver for the database. Let the driver be db-connector.jar. Then run the following command from terminal:

java -cp db-connector.jar:karma-offline.jar edu.isi.karma.rdf.OfflineRdfGenerator --sourcetype DB
--modelfilepath <modelfilepath> --outputfile <outputfile> --dbtype <dbtype> --hostname <hostname> 
--username <username> --password <password> --portnumber <portnumber> --dbname <dbname> --tablename <tablename>

Valid argument values for dbtype are Oracle, MySQL, SQLServer, PostGIS, Sybase

Example invocation:

java -cp mysql-connector-java-5.0.8-bin.jar:karma-offline.jar edu.isi.karma.rdf.OfflineRdfGenerator --sourcetype DB --dbtype MySQL --hostname example.com --username root --password secret --portnumber 3306 --dbname Employees --tablename Person --modelfilepath "/files/models/db-r2rml-model.ttl" --outputfile db-rdf.n3

JSONRDFGenerator

This is deprecated. Please use GenericRDFGenerator

This API is meant for repeated RDF generation from the same model. In this setting we load the models at the beginning and then every time the user does a query we use the model to generate RDF. This API currently only takes JSON as an input source. To use this, include karma-offline.jar in your classpath.

edu.isi.karma.rdf.JSONRDFGenerator

API to add a model to the RDF Generator

// modelIdentifier : Provides a name and location of the model file
void addModel(R2RMLMappingIdentifier modelIdentifier); 

API to generate the RDF given a model name and json Data

//sourceName -> The name used for the model when added using the addModel API
//jsonData   -> The input json data
//addProvenance -> flag to indicate if provenance information should be added to the RDF
//pw -> Writer for the RDF output
void generateRDF(String sourceName, String jsonData, boolean addProvenance, PrintWriter pw)
   

Example use:

JSONRDFGenerator rdfGenerator = JSONRDFGenerator.getInstance();

//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
				"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);

String filename = "files/data/people.json";
String jsonData = EncodingDetector.getString(new File(filename),
					"utf-8");
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
rdfGenerator.generateRDF("people-model", jsonData, true, pw);
String rdf = sw.toString();
System.out.println("Generated RDF: " + rdf);

GenericRDFGenerator

This API is meant for repeated RDF generation from the same model. In this setting we load the models at the beginning and then every time the user does a query we use the model to generate RDF. The input can be JSON, CSV or an XML File / String / InputStream. To use this, include karma-offline.jar in your classpath.

edu.isi.karma.rdf.GenericRDFGenerator

API to add a model to the RDF Generator

// modelIdentifier : Provides a name and location of the model file
void addModel(R2RMLMappingIdentifier modelIdentifier); 

API to generate the RDF given a model name and json/csv/xml Data

//modelName -> The name used for the model when added using the addModel API
//sourceName -> The name of the file etc from which the data was read. This is used as a hint for dataType detection if dataType is passed as null.
//data   -> The input data. This can be in json or csv or xml format
//dataType -> The input data type if know. Values: InputType.CSV / InputType.JSON / InputType.XML. If set to null, karma will try to detect the dataType.
//addProvenance -> flag to indicate if provenance information should be added to the RDF
//pw -> Writer for the RDF output. This can be an N3KR2RMLRDFWriter or JSONKR2RMLRDFWriter or BloomFilterKR2RMLRDFWriter
void generateRDF(String modelName, String sourceName, String data, InputType dataType, int maxNumLines, boolean addProvenance, KR2RMLRDFWriter writer)

API to generate the RDF given a model name and json/csv/xml File

//modelName -> The name used for the model when added using the addModel API
//inputFile   -> The input file. This can be in json or csv or xml format
//inputType -> The input file type if know. Values: InputType.CSV / InputType.JSON / InputType.XML. If set to null, karma will try to detect the inputType.
//addProvenance -> flag to indicate if provenance information should be added to the RDF
//pw -> Writer for the RDF output. This can be an N3KR2RMLRDFWriter or JSONKR2RMLRDFWriter or BloomFilterKR2RMLRDFWriter
void generateRDF(String modelName, File inputFile, InputType inputType, boolean addProvenance, KR2RMLRDFWriter writer)

API to generate the RDF given a model name and an InputStream

//modelName -> The name used for the model when added using the addModel API
//sourceName -> The name of the file etc from which the data was read. This is used as a hint for dataType detection if dataType is passed as null.
//data   -> The input data stream. This can be in json or csv or xml format
//dataType -> The input data type if know. Values: InputType.CSV / InputType.JSON / InputType.XML. If set to null, karma will try to detect the dataType.
//addProvenance -> flag to indicate if provenance information should be added to the RDF
//pw -> Writer for the RDF output. This can be an N3KR2RMLRDFWriter or JSONKR2RMLRDFWriter or BloomFilterKR2RMLRDFWriter
void generateRDF(String modelName, String sourceName, InputStream data, InputType dataType, boolean addProvenance, KR2RMLRDFWriter writer)

Example use:

GenericRDFGenerator rdfGenerator = new GenericRDFGenerator();

//Construct a R2RMLMappingIdentifier that provides the location of the model and a name for the model and add the model to the JSONRDFGenerator. You can add multiple models using this API.
R2RMLMappingIdentifier modelIdentifier = new R2RMLMappingIdentifier(
				"people-model", new File("/files/models/people-model.ttl").toURI().toURL());
rdfGenerator.addModel(modelIdentifier);

String filename = "files/data/people.json";
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
URIFormatter uriFormatter = new URIFormatter();
KR2RMLRDFWriter outWriter = new N3KR2RMLRDFWriter(uriFormatter, pw);
rdfGenerator.generateRDF("people-model", new File(filename), InputType.JSON, true, outWriter);
String rdf = sw.toString();
System.out.println("Generated RDF: " + rdf);