This repository has been archived by the owner on Sep 23, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Create a Vocabulary for cataloging datasets #23
Comments
I've toyed with this idea for some time, here's what I came up with so far. This is based on the W3C Data Catalog Vocabulary (DCAT), including a subset of the additional properties defined by DCAT Application Profile for data portals in Europe. Construction principles:
The correspondence between a dcat:Dataset and an OData service is in sync with the use of "data set" as a "collection of closely related tables", see https://en.wikipedia.org/wiki/Data_set, and the use of DataSet in the .NET framework where "A DataSet represents a complete set of data including the tables that contain, order, and constrain the data, as well as the relationships between the tables.", see https://msdn.microsoft.com/en-us/library/ss7fbaez(v=vs.110).aspx. <Term Name="Catalog" Type="Catalog.CatalogType" AppliesTo="EntitySet">
<Annotation Term="Core.Description" String="A data catalog is a curated collection of metadata about datasets." />
</Term>
<ComplexType Name="CatalogType">
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the catalog." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the catalog." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="homepage" Type="Edm.Date">
<Annotation Term="Core.Description" String="The homepage of the catalog." />
</Property>
<Property Name="dataset" Type="Collection(Catalog.Dataset)">
<Annotation Term="Core.Description" String="A dataset that is part of the catalog." />
</Property>
</ComplexType>
<Term Name="Dataset" Type="Catalog.DatasetType" AppliesTo="EntitySet">
<Annotation Term="Core.Description"
String="A collection of data, published or curated by a single agent, and available for access or download in one or more formats." />
</Term>
<ComplexType Name="DatasetType">
<Property Name="identifier" Type="Edm.String">
<Annotation Term="Core.Description" String="A unique identifier of the dataset." />
</Property>
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the dataset." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the dataset." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="publisher" Type="Edm.String">
<Annotation Term="Core.Description" String="An entity responsible for making the dataset available." />
</Property>
<Property Name="keyword" Type="Collection(Edm.String)">
<Annotation Term="Core.Description" String="A keyword or tag describing the dataset." />
</Property>
<Property Name="distribution" Type="Catalog.Distribution">
<Annotation Term="Core.Description" String="Connects a dataset to its available distributions." />
</Property>
<!-- properties from DCAT-AP -->
<Property Name="conformsTo" Type="Edm.String">
<Annotation Term="Core.Description" String="An implementing rule or other specification." />
</Property>
<Property Name="accessRights" Type="Edm.String">
<Annotation Term="Core.Description" String="Indicates whether the Dataset is open data, has access restrictions or is not public." />
<!-- TODO: annotation allowedValues 'public', 'restricted', and 'non-public' -->
</Property>
<Property Name="versionInfo" Type="Edm.PrimitiveType">
<Annotation Term="Core.Description" String="A version number or other version designation of the Dataset." />
</Property>
<!-- candidates from DCAT-AP: hasVersion, isVersionOf -->
</ComplexType>
<Term Name="Distribution" Type="Catalog.DistributionType" AppliesTo="EntitySet">
<Annotation Term="Core.Description">
<String>Represents a specific available form of a dataset.
Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints.
Examples of distributions include a downloadable CSV file, an API or an RSS feed.</String>
</Annotation>
</Term>
<ComplexType Name="DistributionType">
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the dataset." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the dataset." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="accessURL" Type="Edm.String">
<Annotation Term="Core.Description"
String="A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset." />
</Property>
</ComplexType> |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Open Data datasets are described using common terms, such as license, publisher, creation date, and update frequency.
In the open data community, DCAT (http://www.w3.org/TR/vocab-dcat/) defines common terms for this cataloging information, pulling also from Dublin Core (http://dublincore.org/documents/dcmi-terms/).
We should define OData vocabularies to allow marking up an OData service with the same terms for general dataset cataloging.
The text was updated successfully, but these errors were encountered: