-
Notifications
You must be signed in to change notification settings - Fork 2
1. CSV Mapping
The first step allows you to upload one or more CSV files, that serve as input of your Cube, and from which you define a set of output tables which will be part of the cube. The overall operation of this first step is called CSV Mapping (i.e. mapping from CSV to Cube).
In this step it is necessary to create multiple tables based on the CSV input. It is necessary to define at least one table which gives the cube its structure, this table is called the Cube Table. The fields of the Cube Table will define all Dimensions of the Cube. With additional tables it is possible to create Multilingual Concepts used by the Cube Table.
With "+" you can upload a new syntactically valid CSV file. After the upload you should see a preview of the columns and per column the first three rows of the input data on the left side of the screen. It is possible to replace the uploaded CSV later with a newer version.
To create a new table you can select on the left input side of the screen which columns of a CSV you like to transform into a table. It is always possible to add, change or remove the column mappings. You can only use the input from one CSV per table. It is possible on the other hand to create multiple tables based on one input CSV (E.g. to create the concepts tables based on multilingual fields).
Every cube has one main table to represent the observations, with their multiple dimensions.
This table is called the cube table and is created by checking the "Cube table" checkbox.
Additional tables are possible to be created to provide (multilingual) concepts connected to the observation table.
In the observation table it is important to distinguish between, key dimensions and measurement dimensions.
If a dimension in the cube table is based on Concepts you can first create the concept tables and afterwards add links to the concept table in the mapping of the cube table.
By default, columns are mapped as literal attributes. If you want to treat a column as a concept (a resource with a unique identifier) you have to create a Concept table from that column and link to it using the "Link to another table" feature.
The concept table provides multilingual labels and additional information (for grouping or external identifiers) for concepts.
A concept table is created without checking the "Cube table" checkbox.
To be used in the generated cube, each concept table must be linked through a Link to another table from either the Cube table or another Concept Table.
A unique URI string identifying every row of a dimension.
For each table, an identifier template is needed to build a unique identifier for each row of the table. If this notion is too technical or if you don't fully understand it, please just leave the content of the field empty to get an auto-generated identifier.
Best practice: When designing an identifier template, a good practice is to first have a fixed prefix corresponding to the table name (automatically proposed by the tool), and then a list of columns from the CSV that allows to uniquely identify each row of that table.
In the field, a column name must be written in-between curly brackets ({}
) and each column is separated from the other with a slash (/
). An identifier will be generated as a URL, it must thus generate a URL-Safe String.
If a key dimension does not provide an unique identifier, preferably use the english name of the concept as an identifier.
This field has an auto-complete that will show up when you write an opening curly bracket ({
). This will help you to choose the columns from the CSV and thus avoid misspellings.
Warning: if your project contains multiple tables, a table's identifier template must also avoid collisions between the rows of different tables. This is why the table name is proposed as a fixed prefix.
The display color is used only inside the Cube Creator to visually connect the CSV inputs columns on the left to the mapped table rows.
A target property is proposed based on the input data. If you change the target property it must be URI-Safe (best only letters and numbers without spaces).
If you know specific ontologies you like to re-use in your datacube, an auto-complete is available and will appear when you start writing the name of a common ontology (e.g. "schema").
Commonly used target properties:
Property | Description | Notes |
---|---|---|
schema:identifier |
To add identifiers also valid outside the data set. | optional |
To define multilingual concepts the Target Properties commonly used are:
Property | Description | Notes |
---|---|---|
schema:name |
The name of the concept. | mandatory, needs a language tag |
schema:description |
A description of the concept. | optional, needs a language tag |
schema:position |
for positions | for concepts used in ordinal scales mandatory |
schema:identifier |
for identifiers also valid outside the data set | optional |
See the language and translations paragraph about how to map multilingual values.
It is possible to further attach information relevant to the concept, e.g. geographical coordinates, categorizations, and even link to other concept tables. Try to reuse already existing properties e.g. schema.org.
Semantically relevant properties are:
Property | Description | Notes |
---|---|---|
schema:latitude |
WGS84 coordinate. | mandatory for symbols on a map visualization |
schema:longitude |
WGS84 coordinate. | mandatory for symbols on a map visualization |
The correct data type allows the software consuming your cubes to decide the correct presentation. It is mostly used and important for measurement dimensions. Based on the data type cube creator will check if the syntax of all the input data of the mapped column is correct.
The following data types are available for your input data:
Data type | Format | Example |
---|---|---|
boolean | "true" or "false" ("0" and "1" aren't supported by jan-2021) |
true , false
|
date | YYYY-MM-DD | 1879-04-19 |
dateTime | YYYY-MM-DDThh:mm:ss | 1972-06-25T22:30:00 |
decimal | Decimal separator is . , thousands separators are not allowed |
123.456 , +1234.456 , -.456
|
int | thousands separators are not allowed |
-2147483648 , 0 , -0000000000000000000005 or 2147483647
|
string | separator char must be in " , " inside quoted strings must be ""
|
Müller , "Müller, Hans" , "Hans ""Johnny"" Müller"
|
time | hh:mm:ss |
21:32:52 , 21:32:52+02:00 , 19:32:52Z , 19:32:52+00:00 and 21:32:52.12679
|
If the transformation encounters a type mismatch (i.e. a value in the CSV doesn't match the type defined in the mapping), it will fail and an error message will be displayed in the logs of the transform job.
It is always possible to let a transformation run without any data type attached. Be aware that the final data consuming application might not behave correctly in the case there is no data type specified.
For dimensions which provide strings, a language should be specified.
For strings with translations in different languages, ideally no strings are directly attached to the Cube table, but a Concept table should be created. The normal way to handle translations is to have the distinction in the original CSV, one language per column. Those columns are then mapped to the Concept table with the same schema:name
target property, a "string" datatype and the corresponding language.
Caution: schema:name
must always be provided inside a concept table, otherwise the visualization will fail. Also the schema:description
should be provided in multiple languages where used.
In case there is only one language available, still provide the correct language tag. (It will be used as a fallback in other language settings.)
If an empty or missing values in your column has a specific meaning (e.g. being equal 0) you can define the default value to be written in the cube instead of a missing value. If a default value is set, no explicitly missing values are put in the cube.