-
Notifications
You must be signed in to change notification settings - Fork 196
Configuration
Configuration topics:
- Amount of Memory Karma Can Use
- Karma User Home
- Information Karma Learns
- Modeling Configuration
- UI Configuration
- User Preferences
- Files Where Karma Saves Your Work
- Automatically Loading Commonly-Used Ontologies
- Locations of Files Karma Publishes
- Reset
If you want to use Karma with files containing thousands of rows or ontologies containing hundreds of classes and properties, then you need to increase the amount of memory allocated to Karma.
-
If using Jetty, define the
MAVEN_OPTS
environment variable and set it to a value larger thanXmx1024m
If you are using
bash
(e.g., on the Mac), put the following in the.profile
file found in your home directory. The example shows how to give Karma 4GB of memory. BTW, you can also put in your .profile a convenient alias to invoke Karma by typingkarma
:
export MAVEN_OPTS=-Xmx4096m alias karma="cd ~/Web-Karma/karma-web;mvn jetty:run"
* If using tomcat
define the `CATALINA_OPTS` environment variable and set it to a value larger than `Xmx1024m`
If you are using `bash` (e.g., on the Mac), put the following in the `.profile` file found in your home directory. The example shows how to give Karma 4GB of memory.
:
export CATALINA_OPTS="$CATALINA_OPTS -Xms2048m" export CATALINA_OPTS="$CATALINA_OPTS -Xmx4096m"
## <a name="karma_home"></a>Karma User Home
Karma stores all user settings in folder `{user.home}/karma`. You can change this default location by changing it from the **footer of Karma UI**. Let this be called as the `KARMA_USER_HOME`. `KARMA_USER_HOME` folder contains the following subfolders:
- `AlignmentGraph`: you can ignore this folder, it is for a new feature we have not released yet.
- [`SemanticTypeModels`](#SemanticTypeModels): stores the learned semantic types.
- `JSON`: you can ignore this folder. It is for a new save as JSON feature.
- [`UserPrefs`](#user_preferences): file to store your user preferences.
- `UserUploadedFiles`: a copy of all the files you loaded in Karma from your local disk. You can delete them if you want.
- `history`: files for internal Karma use.
- [`preloaded-ontologies`](#preloaded_ontologies): put here the `owl` and `rdf` files you want Karma to load each time you load the Karma Web page.
- `python`: Allows you to define your own Python library.
All published data (models, rdf) is under karma-web/src/main/webapp/publish folder
## <a name="SemanticTypeModels"></a>Information Karma Learns `Semantic Types`
Karma includes a component that learns to assign semantic types to columns of data.
Each time you assign a semantic type, Karma stores information about the semantic type in in the `KARMA_USER_HOME/SemanticTypeModels` directory.
Your SemanticTypeModels directory can grow large and it's OK to remove it if it gets over a megabyte.
If you do, Karma will stop offering you suggestions when you assign semantic types.
Karma will learn from subsequent assignments and start offering suggestions again.
## <a name="preloaded_ontologies"></a>Automatically Loading Commonly-Used Ontologies `preloaded-ontologies`
When Karma starts, it automatically loads all the ontology files found in the `preloaded-ontologies` subdirectory of the `KARMA_USER_HOME` directory.
Here is an example of a `preloaded-ontologies` folder with 6 ontology files that Karma loads each time it starts, i.e., each time you load the Karma web page.
```shell
szeke:~ szekely> cd karma
szeke:karma szekely> ls preloaded-ontologies
dcterms.rdf dublincore.owl foaf.rdf skos-xl.rdf skos.rdf vp-basic.owl
Karma modeling behavior can be configured using the KARMA_USER_HOME/Config/modeling.properties file
.
Here is a sample modeling.properties
file:
##########################################################################################
#
# Graph Builder
#
##########################################################################################
manual.alignment=false
thing.node=false
node.closure=true
properties.direct=true
properties.indirect=true
properties.subclass=true
properties.with.only.domain=true
properties.with.only.range=true
properties.without.domain.range=false
##########################################################################################
#
# Prefixes
#
##########################################################################################
karma.source.prefix=http://isi.edu/integration/karma/sources/
karma.service.prefix=http://isi.edu/integration/karma/services/
##########################################################################################
#
# Model Learner
#
##########################################################################################
learner.enabled=true
max.queued.mappings=100
max.candidate.models=5
multiple.same.property.per.node=true
# scoring coefficients, should be in range [0..1]
scoring.confidence.coefficient=1.0
scoring.coherence.coefficient=1.0
scoring.size.coefficient=1.0
models.json.dir=JSON/
models.graphviz.dir=GRAPHVIZ/
alignment.graph.dir=AlignmentGraph/
##########################################################################################
#
# Other Settings
#
##########################################################################################
models.display.nomatching=false
history.store.old=false
-
manual.alignment
: if you have a large Ontology and you do not wish Karma to generate datastructures to link the classes on its own, you can turn on Manual Alignment by settingmanual.alignment=true
-
thing.node
: in the automatic alignment mode, the suggested model may include disconnected components. If you set this flag to true, Karma creates one connected component by adding a root node called Thing and then establishing a rdfs:subClassOf link between the disconnected components and the Thing node. -
node.closure
: in automatic alignment, once one node is added to the model, Karma finds the nodes that are connected to the new node in the ontology and adds them to its data structure. Setting this flag to false, prevents computing the node closure. In the manual alignment, this flag will be set to false in the code, and it is also not recommended to disable this flag in the automatic alignment. -
properties.direct
: in automatic alignment, Karma tries to connect different nodes in the model. If this flag is true, for each pair of class nodes A and B, Karma takes into account the object properties that are have explicitly A (or B) in the domain and B (or A) in the range definitions. -
properties.indirect
: if the flag is true, Karma considers inherited properties when connecting the class nodes in the model. -
properties.subclass
: if the flag is true, Karma considers rdfs:subClassOf definitions in the imported ontologies as possible links between class nodes in the model. -
properties.with.only.domain
: some properties in the ontology that only have any domain definition (no range is specified). If this flag is set to true, for a particular class node A, Karma considers all object properties that have A (or its superclasses) in their domain as possible outgoing link from node A. -
properties.with.only.range
: some properties in the ontology that only have any range definition (no domain is specified). If this flag is set to true, for a particular class node B, Karma considers all object properties that have B (or its superclasses) in their range as possible incoming link to node B. -
properties.without.domain.range
: if this flag is set to true, all the object properties without domain and range could be used to connect the class nodes in the domain. The default value is false, because we do not want to establish a link between every pair of nodes in our graph structure. -
karma.source.prefix
: this is the prefix that Karma uses to publish source models in the Jena repository. -
karma.service.prefix
: this is the prefix that Karma uses to publish service models in the Jena repository. -
learner.enabled
can be modified to turn the Model Learner ON or OFF. It is by default ON. The following properties configure the Model Learner:-
models.json.dir
configures the name of the directory under your KARMA_USER_HOME where the Model Learer stores its JSON files -
models.graphviz.dir
configures the name of the directory under your KARMA_USER_HOME where the Model Learer stores its GraphViz files -
alignment.graph.dir
configures the name of the directory under your KARMA_USER_HOME where the Model Learer stores its learnings -
max.queued.mappings
,max.candidate.models
,multiple.same.property.per.node
,scoring.confidence.coefficient
,scoring.coherence.coefficient
,scoring.size.coefficient
configure Model Learner details and should not be changed by the user
-
- By default when you Apply R2RML Models from Repository, it tries to match the column names of the worksheet with those of the models in the repository and shows only the models that contain overlapping columns. You can turn this feature off by setting
models.display.nomatching=true
. This is useful for older models that do not contain enough information to match the column names. - Setting
history.store.old=false
causes the uncompacted history to be written. This is the old way of writing the history that can be used for debugging purposes.
Karma UI can be configured using the KARMA_USER_HOME/Config/ui.properties file
.
Here is a sample ui.properties
file:
google.earth.enabled=true
max.loaded.classes=-1
max.loaded.properties=-1
d3.display.charts=true
- To disable google earth, set
google.earth.enabled=false
. By default it is enabled. - The Karma UI by default loads all classes and properties as a list. This can slow the system if the size of the Ontology is large. In those cases, you can set
max.loaded.classes
to a small number, example:max.loaded.classes=100
so that if the number of classes increases beyond 100, it would not render the class list, and user can manually type in the class name. Karma does provide type-ahead to assist the user as he enters the class name. The default value ismax.loaded.classes=-1
and will cause the class list to always show. - Similar to classes, you can also limit the properties list using
max.loaded.properties
- If the source is very wide, generating the charts for the source can take a lot of time, leading to a very slow loading of the worksheet. To make it faster, you can disable the loading of the charts by setting
d3.display.charts=false
User preferences enable you to control how much information Karma shows on the screen.
To specify your preferences you need to enter them in a JSON file in the <user.home>/karma/UserPrefs
directory.
Here is an example of preferences stored in WSP1.json
:
szeke:UserPrefs szekely> pwd
/Users/szekely/karma/UserPrefs
szeke:UserPrefs szekely> cat WSP1.json
{
"ViewPreferences": {
"defaultRowsToShowInTopTables": 150,
"maxCharactersInCell": 80,
"maxCharactersInHeader": 10,
"maxRowsToShowInNestedTables": 25
}
}
-
defaultRowsToShowInTopTables
: the number of top-level rows shown in the browser. The worksheet may have many more rows, and in the current version you cannot scroll to rows that are not visible (the little controls on the screen don't work). -
maxCharactersInCell
: maximum number of characters of a cell value that will be shown on the screen. To see the full value click on theExpand
menu. -
maxCharactersInHeader
: maximum number of characters of a column header that will be shown on the screen. Currently, there is no way in the user interface to see the full string. -
maxRowsToShowInNestedTables
: Karma supports worksheets where the values of cells can themselves be tables. These are called nested tables. This parameter controls the number of rows from nested that will be shown in the browser.
Karma keeps track of different users by defining a workspace for each one: this is why the preference files are called WSP1.json
, WSP2.json
, etc.
When you install Karma in your local machine you will most likely only have one preferences file so simple edit the one you have.
Most likely it is called WSP1.json
.
Note: a future version of Karma will enable users to login and workspaces will be tied to login names.
Your preferences file may also contain information that Karma saves so that it can use it in future sessions. Examples include the URLs for database connections and default namespaces for RDF export. You can edit the information in these commands so that Karma uses the values you enter in the future. If you subsequently change the settings in the interface, those settings will be stored in your user preferences file, overwriting your edits.
The Config
directory under KARMA_USER_HOME
also contains a WorkspacePref.template
file.
Karma copies the preferences for each workspace from this template file, so you can edit it to change the default preferences for all workspaces.
Karma automatically saves a R2RML model containing all the commands you perform in the user interface.
This is why there is no Save button in Karma, it saves your work behind the scenes as you go.
The auto saved R2RML files are stored in the KARMA_USER_HOME/R2RML
directory.
Here is an example of of a R2RML
directory:
szeke:History szekely> pwd
/Users/szekely/karma/R2RML
szeke:R2RML szekely> ls
WSP1_DMA American Dataset_DMA American Dataset.csv-auto-model.ttl WSP1_crystal-bridges-records_Sheet1.json-auto-model.ttl
WSP1_DMA-artist-labels.csv-auto-model.ttl WSP1_fis_departmentsdat-auto-model.ttl
WSP1_DMA-artwork-labels.csv-auto-model.ttl WSP1_ima-artworks-demo.xml-auto-model.ttl
WSP1_alignment-geonames-saam-jarowinkler-01-07a.xml-auto-model.ttl WSP1_npg-artist-death.json-auto-model.ttl
The file names match the names of the files you load in Karma, and they contain all the state-changing commands that you performed on that file (e.g., publish commands are not in the history). You should not edit the content of these files, but you may want to make copies or rename them so that you can preserve the models or transformations you defined for a file. If you don't preserve copies of these files, Karma will overwrite them with whatever new commands you perform in the user interface.
Sometimes, things get messed up and in that case, you can save the auto-model and then after restarting karma, you can apply the auto-model to it to be in the same state as before.
Karma has a Reset
command that can delete all the history files.
Warning: this is a pretty drastic measure to take so before hitting the Reset
button try to move some of the history files to other directories.
The Publish
commands in Karma publish different kinds of files.
Karma makes these files available as links in the Web browser, so they are not in your KARMA_USER_HOME
folder.
Instead, Karma puts these files in
Web-Karma/karma-web/src/main/webapp/publish/
Karma publishes the following types of files:
- RDF files containing worksheet data converted to RDF according to a model
- R2RML containing an RDF representation of the model
- CSV representations of worksheet data in case an Excel spreadsheet is imported
- KML and Shapefile representations of worksheet data when data is modeled using a geospatial ontology
- GRAPHVIZ contains dot files representing the model
- JSON representations of worksheet data
- REPORT contains a .MD file that gives a report of all transformations done on the date and all the Semantic types that were assigned. This is useful if one needs to document how the model was generated.
- AutoOntology contains the Ontology that gets generated by Karma using the Worksheet Menu -> Suggest Model -> Generate New Ontology
Karma stores the RDF files generated from worksheets in the karma-web/src/main/webapp/RDF
subdirectory of your Karma home directory.
The files do not have particularly descriptive names, but Karma shows them as links in the user interface.
If you lose track of the files, you can look for them in this directory.
Here is an example:
szeke:RDF szekely> pwd
/Users/szekely/Web-Karma/karma-web/src/main/webapp/RDF
szeke:RDF szekely> ls -lt
total 848
-rw-r--r-- 1 szekely staff 78930 May 14 18:49 WSP1VW1.n3
-rw-r--r-- 1 szekely staff 46323 Apr 27 20:17 WSP1VW3.n3
-rw-r--r-- 1 szekely staff 29950 Apr 27 20:07 WSP1VW2.n3
The model files contain the RDF representation of the models you build in Karma.
You need these files if you want to use the batch RDF generation facility in Karma.
in the karma-web/src/main/webapp/R2RML
subdirectory of your Karma home directory.
Karma generates a new file each time you publish a model. You can determine from the name of the file which source it corresponds to. Here is an example:
szeke:R2RML szekely> pwd
/Users/szekely/Web-Karma/karma-web/src/main/webapp/R2RML
szeke:R2RML szekely> ls
AAT_CONTRIB_RELS_NOTE_2C213AAE-8A33-5BEF-6CBF-ABFF497A483C.n3
DMA American Dataset_DMA American Dataset.csv_36DBBD9F-FDC8-AA9C-6AF5-16F8B43478F7.n3
You can use Karma to publish CSV files of your worksheet data.
The CSV are stored in the karma-web/src/main/webapp/publish/CSV
subdirectory of your Karma home directory.
Here is an example:
szeke:CSV szekely> pwd
/Users/szekely/Web-Karma/karma-web/src/main/webapp/publish/CSV
szeke:CSV szekely> ls
AAT_SUBJECT.csv alignment-geonames-saam-jarowinkler-01-07a.xml.csv
DMA American Dataset_DMA American Dataset.csv all-met.json.csv
You can use Karma to publish KML and Shapefiles of your worksheet data that you model according to a geospatial ontology.
These files are stored in the karma-web/src/main/webapp/publish/SpatialData
subdirectory of your Karma home directory.
Here is an example: Need an example with KML and Shapefiles. I have none.
szeke:SpatialData szekely> pwd
/Users/szekely/Web-Karma/karma-web/src/main/webapp/publish/SpatialData
The Karma Reset
command is shown below.
This command will delete
the learned semantic types (more info: Files Where Karma Saves Your Work)
and the histories (more info: Information Karma Learns).
Warning this command is not undoable and if you delete the histories you cannot get them back.