Skip to content
Robert Henschel edited this page Aug 31, 2022 · 27 revisions

Why should you create Cubes?

A good amount of data in organisations is maintained in tables with multiple columns. Typically you can think of a multitude of Excel or CSV documents. This data often has a dimension which describes the time, sometimes other dimensions for classifying the data, and in most cases some actual observed values or counts.

Such tables are holding the data in a structured form, but most of the time, the information to understand the columns and also the necessary metadata enabling the creation of useful representations in charts and visualisations is missing.

When creating Cubes you as the data provider and domain specialist are able to augment and annotate your data with everything necessary to understand the input data – directly in the to be published dataset. Finally, fully annotated Cubes can be used to visualize your data with tools like https://visualize.admin.ch .

How can we create Cubes?

Cube Creator allows us to transform data provided as clean CSV into a standardised RDF Cube format. At a second step in the Cube Creator – the Cube Designer – it is possible to annotate the Cube with the necessary descriptive and technical metadata. Furthermore it is possible to map common values to known concepts (e.g. Cantons, Municipalities, Companies, Departement) which further augments the data at hand. Finally the Cube Creator allows to manage the publishing of the Cube on LINDAS which allows it to be consumed through https://visualize.admin.ch (for end-users) and queried on https://lindas.admin.ch/sparql/ (for developers).

What do you need to create a Cube?

Multiple Dimensions

Another way to think of Cubes is as multi-dimensional representations of your input tables. (Also compare to https://statistics.gov.scot/help/data_cubes for a in-depth illustration of the data cube concept.)

In the image above we see a cube with 3 dimensions: Year, Location and Season. The cube records the average temperature for each of these combinations.

The source of this cube might be provided through the following table:

Year Location Season Average Temperature
2019 Bern Summer 22 °C
2020 Bern Summer 23 °C
2021 Bern Summer 24 °C
2019 Zürich Summer 21 °C
2020 Zürich Summer 22 °C
2021 Zürich Summer 23 °C
2019 Bern Winter 12 °C
2020 Bern Winter 13 °C
2021 Bern Winter 14 °C
2019 Zürich Winter 11 °C
2020 Zürich Winter 12 °C
2021 Zürich Winter 13 °C

Observations

For every combination of these three dimensions, we have a value that we call an observation. It is possible to have more than three dimensions for which every combination provides an observation. It is also possible to have multiple observations per combination of dimensions.

Different Types of Dimensions

A cube needs to have at least two different types of dimensions. The Key Dimension and the Measurement Dimension.

Key Dimension

One or multiple key dimensions together uniquely define an observation. In the example above, the key dimensions to identify an observation are year, location and season combined. It is also possible to have only one key dimension to identify an observation, often the point in time the observation was made.

An observation can also be a statistical aggregation. (How many people live in one place at a specific period of time.) Such an aggregation is often seen in the incoming data for one specific period of time. (E.g. the year 2021, or Q2). It is advised to add a key dimension representing this period even if the incoming data is only valid for one period to start with.

Measurement Dimension

The measurement dimension is the actual value of the observation. Often you only have one Measurement Dimension, but you can have multiple values for one observation. In the example above we have the Temperature. Other possible measurement at the same location would be Humidity, for example.

Every measurement dimension needs to provide one unified unit of measurement. E.g °C in the example above. If your input data has multiple units in one dimension, this is a strong indicator that these are probably multiple dimensions.

Furthermore we also see in the example above that we have an Average Temperature, indicating the measurement can also be an aggregate of some sort for the period of time.

From texts to multilingual concepts

A dimension itself can have multiple forms (we will introduce it later in details with scale of measures). It might be numerical, e.g. 22,23,24 / 2001,2002 or texts (strings), e.g. Summer, Winter. Because cubes should be build multilingual, the goal is to move from textual descriptions to concepts, which in turn can have attached multi-lingual textual descriptions.

So instead of having a text saying Summer. The cube will have concepts which has the labels Sommer, Été, Estate, Summer attached. A description, other attributes or even links to other concepts can be attached to such a concept. To build these concepts, we need to have translations as inputs for our cubes ready.

Mappings to existing concepts (Shared Dimensions)

Finally we have concepts that are used over and over in many cubes. These concepts can be reused inside your data cubes.

  • This allows you to profit from translations and descriptions.
  • Your data can be connected to other data (cubes) through the common use of the same concepts.

To profit from these effects, a one-time mapping from your texts/strings to already defined concepts needs to be provided.