-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard / DCAT (and profiles) export #7600
Conversation
Proposal to better organize various flavor of DCAT. The target is to support: * DCAT * EU DCAT-AP * EU DCAT-AP mobility * EU GeoDCAT-AP.
…1 accrualPeriodicity.
…ccessRights and rights.
… from EU vocabulary as skos:Concept.
…ic is required even if isPrimaryTopic is defined. Only one rights allowed.
…conformity for the CatalogueRecord.
eg. |
There are some vocabularies, like data-themes, HVD and licenses that are provided in the formatters folder as seem used in the xslt. HVD seems requiring keywords from that vocabulary in the metadata records. Also Licenses seems require to be used in use constraints elements in the metadata records. For data-themes, doesn't seem to be associated in the metadata to that vocabulary (https://github.com/geonetwork/core-geonetwork/pull/7600/files#diff-e032efa415655cf3ab6df515d6853952475f431571cbbaf1e2005257761accf2R216-R244), which looks bizarre, but I don't have all the context. Can be these vocabularies downloaded from the INSPIRE Registry? Otherwise, including them in the formatters folder doesn't seem a good location, if the users need to load them in GeoNetwork. |
For license, SHACL validation may expect EU codelist value eg. http://publications.europa.eu/resource/authority/licence/CC0 So we need to map more general values like "http://creativecommons.org/publicdomain/zero/1.0/" or "https://creativecommons.org/publicdomain/zero/1.0/" or the labels "Creative Commons Atribución 4.0 Internacional" to the EU value.
SEMICeu conversion is providing a mapping between topic category or INSPIRE theme and data theme so data theme does not need to be encoded in the record.
It depends, eg. HVD define vocabularies Also for applicable legislation, the vocabularies does not exist yet http://data.europa.eu/r5r/applicableLegislation so indeed a specific thesaurus was created. So all this type of issue is indeed something hard to track because you need to first setup a proper template with required vocabularies depending on the type of DCAT flavor you want to produce. Not sure how we could improve that? By default, it would require to provide templates + vocabularies. |
<!-- Resource | ||
Unsupported: | ||
* dcat:first|previous(sameAs replaces, previousVersion?)|next|last|hasVersion (using the Associated API, navigate to series and sort by date?) | ||
* dct:isReferencedBy (using the Associated API) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean (using the Associated API)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the relations to other records are not stored in the record itself. It also depends on privileges on related records. So another approach could be to use the associated API which resolve relation in both direction and filter records based on privileges.
<entry key="dcat:keyword">mdb:MD_Metadata/mdb:identificationInfo/mri:MD_DataIdentification/mri:descriptiveKeywords/mri:MD_Keywords/mri:keyword</entry> | ||
<entry key="dcat:keyword">mdb:MD_Metadata/mdb:identificationInfo/srv:SV_ServiceIdentification/mri:descriptiveKeywords/mri:MD_Keywords/mri:keyword</entry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these entries used? It seems keywords are handled in https://github.com/geonetwork/core-geonetwork/pull/7600/files#diff-466d4f123bce63f74e41e43d0a57f7be6bde3f8e6f59713fbb03670f8337128d instead of https://github.com/geonetwork/core-geonetwork/pull/7600/files#diff-ce8fa96060c124fa54925719f00ade2da31b5a6d2e56073ffd6de504350cffdfR144-R160 where is used isoToDcatCommonNames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Indeed there is 2 templates handling keyword with CharacterString and Anchor that are more specialized so it should be never used.
<xsl:template mode="iso19115-3-to-dcat" | ||
match="mdb:identificationInfo/*/mri:descriptiveKeywords/*/mri:keyword[gcx:Anchor/@xlink:href != '']" | ||
priority="2"> | ||
<dcat:theme> | ||
<skos:Concept> | ||
<xsl:call-template name="rdf-object-ref-attribute"/> | ||
<xsl:call-template name="rdf-localised"> | ||
<xsl:with-param name="nodeName" | ||
select="'skos:prefLabel'"/> | ||
</xsl:call-template> | ||
</skos:Concept> | ||
</dcat:theme> | ||
</xsl:template> | ||
|
||
<xsl:template mode="iso19115-3-to-dcat" | ||
match="mdb:identificationInfo/*/mri:descriptiveKeywords/*/mri:keyword[gco:CharacterString/text() != '']" | ||
priority="2"> | ||
<xsl:call-template name="rdf-localised"> | ||
<xsl:with-param name="nodeName" | ||
select="'dcat:keyword'"/> | ||
</xsl:call-template> | ||
</xsl:template> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dcat:theme
should be used when the keyword comes from a thesaurus, but here is checking only for gcx:Anchor
.
While dcat:keyword
keyword should be used for free text keywords, assuming that uses gco:CharacterString
, but it can be that keywords from a thesaurus use gco:CharacterString
, no?
Should not be checked if the keyword has a thesarusName element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends a bit on what application are doing with theme and keyword. The main restriction from a DCAT AP point of view is https://github.com/SPW-DIG/metawal-core-geonetwork/blob/dcat/services/src/test/resources/org/fao/geonet/api/records/formatters/shacl/eu-dcat-ap-3.0.0/mdr-vocabularies.shape.ttl#L121-L130
Then it looks like in practice sometime only http://publications.europa.eu/resource/authority/data-theme vocabulary is used to set the dcat:theme
and all others keywords are dcat:keyword
. Here the conversion should be SHACL valid as far as you have a topic or INSPIRE theme mapped to data-theme. If there is a thesaurus, indeed we could produce a dcat:theme
with a skos:Concept
instead of a simple keyword.
Most of the time we are configuring the editor to encode ISO keyword using Anchor (as requested for INSPIRE). This is facilitated by #8118
<entry key="dct:description">mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:transferOptions/mrd:MD_DigitalTransferOptions/mrd:onLine/cit:CI_OnlineResource/cit:description</entry> | ||
<entry key="dct:description">mdb:MD_Metadata/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributor/mrd:MD_Distributor/mrd:distributorTransferOptions/mrd:MD_DigitalTransferOptions/mrd:onLine/cit:CI_OnlineResource/cit:description</entry> | ||
<entry key="owl:versionInfo">mdb:MD_Metadata/mdb:metadataStandard/cit:CI_Citation/cit:edition</entry> | ||
<entry key="adms:versionNotes">mdb:MD_Metadata/mdb:resourceLineage/mrl:LI_Lineage/mrl:statement</entry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rule: | ||
* Use mimetype if any | ||
* Use WWW:DOWNLOAD:(.*=format) if any | ||
* fallback to ancestor::mrd:MD_DigitalTransferOptions/mrd:distributionFormat/*/mrd:formatSpecificationCitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unclear, apparently for dcat:mediaType
, dcat:compressFormat
and dcat:packageFormat
relies onmrd:MD_DigitalTransferOptions/mrd:distributionFormat/*/mrd:formatSpecificationCitation
?
But in https://github.com/geonetwork/core-geonetwork/pull/7600/files#diff-5fa6856eb0e0615450b025fa6ac293be136ee98f4b82b908997ebf6667438ef8R399-R404 it is not set the elementName
for any of them.
For dcat:compressFormat
there is another template, that is also unclear if used: https://github.com/geonetwork/core-geonetwork/pull/7600/files#diff-5fa6856eb0e0615450b025fa6ac293be136ee98f4b82b908997ebf6667438ef8R407-R413
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if pointing to https://github.com/SPW-DIG/metawal-core-geonetwork/blob/dcat/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/formatter/dcat/dcat-core-distribution.xsl#L416-L417 but if yes, the template define a default value for elementName
.
schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/formatter/dcat/dcat-core-associated.xsl
Show resolved
Hide resolved
<xsl:template mode="iso19115-3-to-dcat-distribution" | ||
match="mrd:distributionFormat/*/mrd:fileDecompressionTechnique"> | ||
<xsl:call-template name="rdf-format-as-mediatype"> | ||
<xsl:with-param name="elementName" select="'dcat:compressFormat'"/> | ||
<xsl:with-param name="format" select="*/text()"/> | ||
</xsl:call-template> | ||
</xsl:template> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear as doesn’t seem to work, also apparently it seems the same value assigned to all the online resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this can be challenging as DCAT distribution contains a number of properties that ISO affect to the resource. So it could work fine if we have only one download link in a record or could be encoded using different distribution block but it is not the case usually.
There is a test for example
https://github.com/SPW-DIG/metawal-core-geonetwork/blob/dcat/services/src/test/resources/org/fao/geonet/api/records/formatters/iso19115-3.2018-dcat-dataset.xml#L1254-L1294
is converted to
https://github.com/SPW-DIG/metawal-core-geonetwork/blob/dcat/services/src/test/resources/org/fao/geonet/api/records/formatters/iso19115-3.2018-eu-dcat-ap-dataset-core.rdf#L554-L570
Dedicated template exist in dcat-core-keyword
* Add mapping for referenceSystem * Add test * Disable distribution for now and delegate to DCAT-AP for now.
…intenanceFrequency which can be mapped to mobility DCAT vocabulary https://mobilitydcat-ap.github.io/controlled-vocabularies/update-frequency/latest/index.html#.
Match first resource constraints and then map to access and use elements.
Cardinality: * ISO 0..n * DCAT 0..n * DCAT-AP 0..1 * Mobility DCAT 1..1 (in ISO either use corresponding period eg. P0Y0M0DT1H0M0S or extend the codelist with the proper vocabulary) accrualPeriodicity mapping done using the ISO to Dublin core value mapping but additional checks are done when ISO records extended the codelist and may used the EU Publication Office frequency codes or the Mobility DCAT-AP update frequency codes. Domain specific codelists take priority over the DC or ISO codelists. eg. <mmi:MD_MaintenanceFrequencyCode codeListValue="15min"/> multipleAccrualPeriodicityAllowed is a parameter that can be set to true to allow multiple accrualPeriodicity values. Default to false for EU formatters. true for DCAT.
Hi @fxprunayre, Could you please clarify something for me: why do we need different export profiles (HVD, Mobility etc.)? Why can't we just output one graph that would provide the proper elements to fill the requirements for all of these profiles? I feel like RDF allows creating graphs that are quite rich and contain many different statements targeting various consumers. Is there a reason, technical or other, for this profile-based approach? Thanks |
Some opendata portals expect a particular DCAT profiles but true, we could mix all in one. But if you look to profile's model and SHACL validation, one element can be encoded using different encoding and can be optional in one profile, mandatory in others, or not having the same definition (eg. vocabularies) ... Looking at different versions of a profile, the encoding for an element can also change over time. |
https://semiceu.github.io/DCAT-AP/r5r/releases/3.0.0/#applicableLegislation Add the element to DCAT-AP base. Element is 0..n in DCAT-AP and should be present in extensions (mobility, hvd, geodcat). HVD requires at least http://data.europa.eu/eli/reg_impl/2023/138/oj and cardinality is 1..n. Do not restrict to a particular legislation list. A sample vocabulary is provided but it can be extended depending on catalogue domains.
In DCAT and DCAT-AP, `dct:identifier` is 0..n. Mobility DCAT restrict it to 0..1. In DCAT-AP and extensions, only convert the first identifier as `dct:identifier`; others as `adms:identifier`.
Use `:` separator for URN like identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For DCAT-AP NL, we extend the EU DCAT-AP, with some specific overrides:
- For
dct:LinguisticSystem
map the iso 3 code DUT to NLD as required (maybe, this can be the case for EU DCAT-AP also, but I'm not sure). - Dataset status is not defined in EU DCAT-AP and requires https://publications.europa.eu/resource/authority/dataset-status
- For distributions, some fields are mandatory in DCAT-AP NL and need to override the parameter to copy them from the dataset.
For now, not many differences, but having a way to override the output it seems useful.
Also the list of DCAT outputs, it can be configured, so in most cases catalogues will use only the national DCAT flavour.
…AT language codes (terminology codes)
The backport to
stderr
stdout
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-4.2.x 4.2.x
# Navigate to the new working tree
cd .worktrees/backport-4.2.x
# Create a new branch
git switch --create backport-7600-to-4.2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick 262422f5a3b507e6fcb1b24ffb28ebf7c24a71cd,b1555e93479aec1c677320ad74b1fdec326bd673,0d0667435c237e7aba677273a22892f0c25cf81c,94318cb8edd35173ae6d56e3537a5d7644975682,10ee379856357a14147fc43e181682aada5a0793,1944cd41ef581245f5c578ac3ba21da932ab3f83,18fb48c365073c8ca558761d3d0c59d275e5aaec,18285eb8201f439c47e4a0f1ed2ae7a343ad0326,87fcbf2c3b50b646eed2a8d633f6436eea734bb4,19b11e5f62e04e5aaf4837475afda5903ba81a19,b59f45d56c2d96327e5c62b4b480e453ba4ae625,354ba253e29603f01cd3af1ca9d6a7da3e0eabd1,2ff0c73d505c823be2cc3cdb706af18a8675e734,cb6e1524cb900cb2ed3482f8a28c8a25c5567b5c,ebb11041a5306cbc188c57290dfa687babdb1c75,26297e248eed8917bc93c4aa5b62439b7b4cf229,087111340642a5b381e592ac58d83a4650632c2c,eccaf59fea124cfa15d5f99793c350127a7c81c4,0fbd299115aab86e5764dfc85850d76cbe1f25cb,374682be5c61b4850f5084597a5db698939c10ac,f0bf45e6cf8253e49acb3738b60705ababa87d36,b85e1e50571ae310bb1272124659364814885360,d8b4c7a20583343329c11e93f2cc27c8bee14ca3,e3d27809b101ff4aa16e064f94ce356ec6a0bd84,9aae81d1f47216c2c5ffe9c9582239cad8ca9b6c,92638f0840bd3f8cf9030dd61b8caa6f7601cb4b,bfa921cb3e688811da096b25cca86372c8c895c8,e914ce0101a0141dfed549d6ede3d642d7fd88e8,191b5d245c7ea976f8a04c15902d5a9ec51067bb,52e7b15d0503393a0a47b5ae4706b2ea09cb1645
# Push it to GitHub
git push --set-upstream origin backport-7600-to-4.2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-4.2.x Then, create a pull request where the |
`own` outputSchema return the XML as it is in the database. It is not defined in the CSW spec and is a GeoNetwork specific `outputSchema`. When a CSW request is made in one of the DCAT related output schema (added in geonetwork/core-geonetwork#7600) return the DCAT record as it is. There is no need to define brief, full and summary conversion when they are all the same.
`own` outputSchema return the XML as it is in the database. It is not defined in the CSW spec and is a GeoNetwork specific `outputSchema`. When a CSW request is made in one of the DCAT related output schema (added in geonetwork/core-geonetwork#7600) return the DCAT record as it is. There is no need to define brief, full and summary conversion when they are all the same.
Based on the recent publication of various profiles of DCAT (at least in Europe), GeoNetwork needs to improve its capacity to export metadata records in DCAT. GeoNetwork export to DCAT format initially done in 2012 was targeting interaction with semantic service and semantic sitemap support. Later was added some changes related to GeoDCAT-AP to improve the mapping but the mapping was not fully consistent with SEMICeu work (eg. ISO19139 to GeoDCAT-AP XSL conversion). Actually, new DCAT profiles are defined and some datasets and services managed in catalogue are in the scope of those profiles (eg. HVD, mobility).
DCAT mappings and formatters
This proposal adds:
The mapping is done from ISO19115-3 to DCAT*. An ISO19139 to ISO19115-3 conversion can be applied before if needed.
The SEMICeu XSLT conversion is also included with minor improvements (https://github.com/SEMICeu/iso-19139-to-dcat-ap/pulls). This conversion is from ISO19139 to RDF and if needed a conversion from ISO19115-3 is applied.
The mapping was created with:
Each DCAT formats are available using a formatter eg. http://localhost:8080/geonetwork/srv/api/records/be44fe5a-65ca-4b70-9d29-ac5bf1f0ebc5/formatters/eu-dcat-ap
Validation
Validation in test:
Online validation tool:
Opendata portal testing
Tested with success with the following data portals:
Mapping discussion
Embedded objects vs. references
ISO rarely contains references to objects but it can be done with various encoding:
Anchor
eg. keywords@uuid
(sometimes used in contact)The DCAT mapping provides an entry point to customize where to pick up object references.
DCAT in CSW service
All DCAT profiles are also accessible using CSW protocol.
A
GetRecordById
operation can be used: http://localhost:8080/geonetwork/srv/eng/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecordById&ID=da165110-88fd-11da-a88f-000d939bc5d8&outputSchema=https://semiceu.github.io/DCAT-AP/releases/2.2.0-hvd/ and is equivalent to the API http://localhost:8080/geonetwork/srv/api/records/da165110-88fd-11da-a88f-000d939bc5d8/formatters/eu-dcat-ap-hvd . For this,outputSchema
configuration is improved to not be mixed withtypenames
.If an
outputSchema
does not providebrief
,summary
,full
variations, only one XSLT can be provided (so it is easier to bridge CSW output to a formatter):DCAT download from the record view
To add the formatter in the record view download list:
Related item
Future work & known limitation
Supported by
Funded by BRGM
Funded by Wallonia region (SPW)
Checklist
main
branch, backports managed with labelREADME.md
filespom.xml
dependency management. Update build documentation with intended library use and library tutorials or documentation