Skip to content

Emerging Core URI patterns

skwlilac edited this page Mar 7, 2013 · 1 revision

Emerging Core URI patterns

Version 1.0 (and 2.0d) patterns

The URI patterns establish by the v1.0 (and partially completed v2.0 that Paul Davidson and John Sheridan had been working on) can be summarised as:

For vocabularies and terms:

Pattern Used for
http://{sector}.data.gov.uk/def/{vocabulary} A vocabulary, typically a collection fo some kind of concepts, classes, properties or attributes.
http://{sector}.data.gov.uk/def/{vocabulary}/{term} An individual concept, class, property or attribute.

For reference data about 'real-world-things'

Pattern Used for
http://{sector}.data.gov.uk/d/{concept} URI set
http://{sector}.data.gov.uk/id/{concept}/{reference} A particular instance of a concept 'school', 'road'... a 'thing'
http://{sector}.data.gov.uk/doc/{concept}/{reference} A document containing reference data about the corresponding ../id/...

Note that the id -> doc patterns is just a patterned approach to creating distinct URI for a 'thing' and its description. Other pattern are possible. The core requirement is have distinct, durable, URI that distinguish between a 'thing' and one or more 'descriptions' of that 'thing'.

For datasets and data item (statistics, observations that refer to reference items)

Pattern Used for
http://{sector}.data.gov.uk/data/{dataset} A dataset eg. http://environment.data.gov.uk/data/bathing-water-quality
http://{sector}.data.gov.uk/data/{dataset}(/{subset})* Subset(s) of some root dataset eg.
http://environment.data.gov.uk/data/bathing-water-quality/compliance Annual compliance assessments,
http://environment.data.gov.uk/data/bathing-water-quality/in-season In-season weekly sample assessments
http://{sector}.data.gov.uk/data/{dataset}(/{subset})*/{item}/{reference} Data items with a definite reference key
http://{sector}.data.gov.uk/data/{dataset}(/{subset})(/{dimension}/{ordinate}) Data items organised as a n-dimensional cube. eg. http://environment.data.gov.uk/data/bathing-water-quality/in-season/sample/point/03600/date/20120927/time/103000/recordDate/20120927

The patterns used under /data are less well defined in the v1.0/v2.0 guidance and have been evolving through use.

The dataset and subset URI serve as points to associate publication and licensing meta data with a dataset. The precise nature of what a dataset is a little vague - and there is unlikely to be a single all fitting notion of a dataset. Dataset lifecycles vary with the dynamism of their data. Some are relatively static and republished as complete versioned releases, either as 'in-place' replacement for an earlier release, a 'large' whole dataset increment published alongside its predecessor or as a baseline dataset and succession of increments.

Shortcommings of earlier patterns

TBD...

A hashed alternative to 303 redirects.

For ../id/.. URI for 'things' (aka. non-Information Resource) it has long been the data.gov.uk a practice to use 303 redirect to redirect a reference to 'thing', say a bathing-water, to a document about a bathing-water eg.

http://environment.data.gov.uk/id/bathing-water/ukc2102-03600 redirects to http://environment.data.gov.uk/doc/bathing-water/ukc2102-03600

An alternate practice to that could be adopted that would avoid the 303 redirection is to us a pattern where:

Pattern Description
{uri}#id refers to a real-world thing
{uri} refers to a document that describes the real world thing.

This approach is workable with infrastructure such as the linked data api (LDA) - both for individual item pages and for lists of individuals.

Eg.

Pattern Description
http://environment.data.gov.uk/bathing-water/ukc2102-03600#id refers to a bathing-water
http://environment.data.gov.uk/bathing-water/ukc2102-03600 refers to a document about that bathing water.
http://environment.data.gov.uk/bathing-water/ukc2102-03600/v/{ver} or http://environment.data.gov.uk/bathing-water/ukc2102-03600:{ver} refers to a version of document about that bathing water

This change would result in a the identifier for the 'thing' (a bathing-water) and its describing document being syntactically aligned. Other identifiers for the same thing (bathing-water) known to the publisher could be included with owl:sameAs links (or some more refined form of expression of sameness).

Synthesis from alternate suggestions

There has been a raft of alternate suggestions worked through to greater or lesser extent by different individuals. The key problems that all are trying to address are:

  • delegated administration of URI sub-spaces
  • URI based grouping of related entites (terms in vocabulary, items on a dataset or URI set)
  • peristent identity management in the face of organisational change
  • distinct URI for a 'thing' and each(at least one) of its descriptions.
  • efficient dispatch of web requests to data serving infrastucture
  • shared URI spaces as various levels

Given a URI space there are different, and possibly orthogonal views, one can take of it.

There is an administrative view centred on dividing up the URI space either assigning individual URI to individual resources or (sub-)delegating portions of that space to others to further delegate and/or make assignments. In this view their is pressure to emphase data set organisation and data ownership (which may be orthogonal to both URI space ownership in general and the operational ownership of data publishing infrastructure).

There is also an operational view which is much more concerned with the setting up of infrastructure to serve resource requests. In this view one is more concerned with efficient routing of requests to infrastruture - and pressure to dominate the higher order parts if URI (LH end) with fields facilitate easy dispatch.

  • {uri-authority}
  • {theme}
  • {data-publishing-authority}
  • {collection}
  • {type}
  • {item}

{data-publishing-authority} is clearly some expression of 'organisational' ownership or custodianship of some collection of data. If expressed by some token within a URI there needs to be an understanding that whilst some level of organisational branding may show through in that token, the requirement for URI persistence is such changes in organisational identity should not lead directly to changes in the value of such tokens present in URI. Some transition over time may occur and the data 're-homed' at a new location in URI space, but the metadata associate with the publication should be clear about guaranteed persistence and transitions to a new space should respect the stated persistence declarations (the period over which the URI are stated to be accessible) at least through the provision of redirects to any re-homed URI.

{uri-authority} is some expression of 'organisational' ownership or governance of some tranch of URI space. This may be aligned with data-publishing-authority, but this need not be the case. A given data-publishing-authority might publish data under one or more uri-authority. The original {sector}.data.gov.uk URI scheme vests URI authority in stakeholder groups that are 'thematically' aligned with the given sector. Each data-publisher may be a member of more than one such sector based stakeholder group. This kind of multi-tenancy of the URI space and the practical near-absense of any such stakeholder governance group has hampered the growth of data publish using the previous patterns.

{theme} is some general thematic expression of the topic of a data publication. In the earlier data.gov.uk patterns the sense of {theme} is largely captured by the notion of a {sector}. In INSPIRE the specification of Application Schema (also referred to as data specifications) is organised into 34 themes. To some extent thematic organisation of data can be somewhat arbitrary - and there are sometimes more than one them relevant to a given vocabulary, reference item or dataset item. There may be value in the notion of thematic groupings that span URI authorities eg:

and so forth where there is alignment of the use of a {sector} (education, health, transport, finance....) and some common practice within those sectors. One suggestion, due to Peter Winstanley, is to use UN COFOG (Classification of the Functions of Government) rather than English words as the basis for the tokens used in sectoring eg:

{collection} provides a publisher to publish more than one collection (URI set, dataset, vocabulary...) under a given uri-authority.

{type} provides a means of signalling whether the URI is for reference data about things(ref), vocabularies and definitions (def), datasets and data items (data).

{item} finally is the means of referencing/naming an item within a collections. It may be as simple as a single URI segment reference/key, or it may be a more complex cube like reference elaboration a series of dimensions (axis) and their ordinates.

The point here is that whatever the concrete URI pattern scheme, it is likely to incorporate some or all of the components above in some prearranged lexical ordering. They differ in the choice of orderings and whether some of the elements are omitted.

http://{uri-authority}/{data-publishing-authority}/{collection}/{theme}/{type}/{item}[#{type}]

Sectored domains - omitting {theme} overloaded by {sector}

URI Description
http://environment.data.gov.uk/common/bathing-water/def/BathingWater the class of bathing waters
http://environment.data.gov.uk/common/bathing-water/def/bathingWater a property for referring to a bathing water
http://environment.data.gov.uk/eaew/bathing-water/ref/ukc2102-03600#id a particular bathing water
http://environment.data.gov.uk/eaew/bathing-water/ref/ukc2102-03600 a document about a particular bathing water
http://environment.data.gov.uk/eaew/bathing-water-quality.in-season/data/sample/point/03600/date/20120927/time/103000/recordDate/20120927 a data record for bwq in-season sample assessment

Published elsewhere (with sector as theme in path)

URI Description
http://data.defra.gov.uk/eaew/bathing-water/environment/ref/ukc2102-03600#id a particular bathing water
http://data.defra.gov.uk/eaew/bathing-water-quality.in-season/environment/data/sample/point/03600/date/20120927/time/103000/recordDate/20120927 a bathing water in-season sample assessment
http://data.gov.uk/nwkr/railway-station/transport/ref/crs.MANPIC#id Manchester Picadilly railways station
http://data.gov.uk/nwkr/railway-station/transport/ref/tiploc.MNCPIC#id a timing location point at Manchester Picadilly railway station
http://data.gov.uk/dot/road/transport/ref/M5#id the M5 motorway

Arranged as:

http://{uri-authority}/{data-publishing-authority}/{collection}/{theme}/{type}/{item}[#{type}]
       \___________________________  _________________________/\______________  _____________/
                                   \/                                         \/
                   Request Routing and URI Space delegation          Unique entity identification									   

the left-hand end of the URI tends toward enabling coarse grained request routing while the right hand end of the URI (in conjunction with the LH end) server to establish unique entity identification.

A registry managed by a 'uri-authority' (the owner of a corresponding domain name) can be used to manage shared use of the corresponding URI space. The registry may be used to drive proxy and redirection infrastructure to route requests to infrastructure owned and operated by a 'data-publishing-authority'. 'collection' may be used to do more fine-grained request routing either at the 'uri-authority' or at the 'data-publishing-authority'.

'theme' in the lower orders of the URI pattern may be somewhat redundant [views sought].

In some case, trading-funds, devolved administrations, local-authorities (basically non-central gov public-bodies) - 'uri-authority' and 'data-publishing-authority' may be converged (or not)

Within the {fields} above, should further structure be require, it is suggested that a dot ('.') separated approach be used to introduce structure within the given field (big-endian/little-endian question?)