Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Create a Vocabulary for cataloging datasets #23

Open
mikepizzo opened this issue Oct 25, 2016 · 1 comment
Open

Create a Vocabulary for cataloging datasets #23

mikepizzo opened this issue Oct 25, 2016 · 1 comment

Comments

@mikepizzo
Copy link
Member

Open Data datasets are described using common terms, such as license, publisher, creation date, and update frequency.

In the open data community, DCAT (http://www.w3.org/TR/vocab-dcat/) defines common terms for this cataloging information, pulling also from Dublin Core (http://dublincore.org/documents/dcmi-terms/).

We should define OData vocabularies to allow marking up an OData service with the same terms for general dataset cataloging.

@ralfhandl
Copy link

I've toyed with this idea for some time, here's what I came up with so far.

This is based on the W3C Data Catalog Vocabulary (DCAT), including a subset of the additional properties defined by DCAT Application Profile for data portals in Europe.

Construction principles:

  • DCAT classes are represented as OData terms whose name is the DCAT class name without the namespace prefix
  • DCAT properties are represented as OData properties of the complex type of the OData term whose name is the DCAT property name without the namespace prefix
  • A dcat:Catalog corresponds to an OData Service Catalog
  • A dcat:Dataset corresponds to an OData Service and covers the deployment-independent aspects
  • A dcat:Distribution captures the deployment-specific aspects of an OData Service or CDS Context published on a concrete host

The correspondence between a dcat:Dataset and an OData service is in sync with the use of "data set" as a "collection of closely related tables", see https://en.wikipedia.org/wiki/Data_set, and the use of DataSet in the .NET framework where "A DataSet represents a complete set of data including the tables that contain, order, and constrain the data, as well as the relationships between the tables.", see https://msdn.microsoft.com/en-us/library/ss7fbaez(v=vs.110).aspx.

      <Term Name="Catalog" Type="Catalog.CatalogType" AppliesTo="EntitySet">
        <Annotation Term="Core.Description" String="A data catalog is a curated collection of metadata about datasets." />
      </Term>
      <ComplexType Name="CatalogType">
        <Property Name="title" Type="Edm.String">
          <Annotation Term="Core.Description" String="A name given to the catalog." />
        </Property>
        <Property Name="description" Type="Edm.String">
          <Annotation Term="Core.Description" String="A free-text account of the catalog." />
        </Property>
        <Property Name="modified" Type="Edm.Date">
          <Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
        </Property>
        <Property Name="homepage" Type="Edm.Date">
          <Annotation Term="Core.Description" String="The homepage of the catalog." />
        </Property>
        <Property Name="dataset" Type="Collection(Catalog.Dataset)">
          <Annotation Term="Core.Description" String="A dataset that is part of the catalog." />
        </Property>
      </ComplexType>

      <Term Name="Dataset" Type="Catalog.DatasetType" AppliesTo="EntitySet">
        <Annotation Term="Core.Description"
          String="A collection of data, published or curated by a single agent, and available for access or download in one or more formats." />
      </Term>
      <ComplexType Name="DatasetType">
        <Property Name="identifier" Type="Edm.String">
          <Annotation Term="Core.Description" String="A unique identifier of the dataset." />
        </Property>
        <Property Name="title" Type="Edm.String">
          <Annotation Term="Core.Description" String="A name given to the dataset." />
        </Property>
        <Property Name="description" Type="Edm.String">
          <Annotation Term="Core.Description" String="A free-text account of the dataset." />
        </Property>
        <Property Name="modified" Type="Edm.Date">
          <Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
        </Property>
        <Property Name="publisher" Type="Edm.String">
          <Annotation Term="Core.Description" String="An entity responsible for making the dataset available." />
        </Property>
        <Property Name="keyword" Type="Collection(Edm.String)">
          <Annotation Term="Core.Description" String="A keyword or tag describing the dataset." />
        </Property>
        <Property Name="distribution" Type="Catalog.Distribution">
          <Annotation Term="Core.Description" String="Connects a dataset to its available distributions." />
        </Property>
        <!-- properties from DCAT-AP -->
        <Property Name="conformsTo" Type="Edm.String">
          <Annotation Term="Core.Description" String="An implementing rule or other specification." />
        </Property>
        <Property Name="accessRights" Type="Edm.String">
          <Annotation Term="Core.Description" String="Indicates whether the Dataset is open data, has access restrictions or is not public." />
          <!-- TODO: annotation allowedValues 'public', 'restricted', and 'non-public' -->
        </Property>
        <Property Name="versionInfo" Type="Edm.PrimitiveType">
          <Annotation Term="Core.Description" String="A version number or other version designation of the Dataset." />
        </Property>
        <!-- candidates from DCAT-AP: hasVersion, isVersionOf -->
      </ComplexType>

      <Term Name="Distribution" Type="Catalog.DistributionType" AppliesTo="EntitySet">
        <Annotation Term="Core.Description">
          <String>Represents a specific available form of a dataset. 
Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. 
Examples of distributions include a downloadable CSV file, an API or an RSS feed.</String>
        </Annotation>
      </Term>
      <ComplexType Name="DistributionType">
        <Property Name="title" Type="Edm.String">
          <Annotation Term="Core.Description" String="A name given to the dataset." />
        </Property>
        <Property Name="description" Type="Edm.String">
          <Annotation Term="Core.Description" String="A free-text account of the dataset." />
        </Property>
        <Property Name="modified" Type="Edm.Date">
          <Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
        </Property>
        <Property Name="accessURL" Type="Edm.String">
          <Annotation Term="Core.Description"
            String="A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset." />
        </Property>
      </ComplexType>

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants