Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OGC Metadata Codesprint 2024 - Proposals and use cases #17

Open
ByronCinNZ opened this issue Sep 26, 2024 · 37 comments
Open

OGC Metadata Codesprint 2024 - Proposals and use cases #17

ByronCinNZ opened this issue Sep 26, 2024 · 37 comments
Labels
Codesprint OGC Metadata Codesprint Sydney 2024

Comments

@ByronCinNZ
Copy link
Collaborator

ByronCinNZ commented Sep 26, 2024

Please use this space to submit ideas for issues that could be addressed during the OGC Metadata Codesprint, which will be held from November 18 to 19, 2024 Sydney Aus and online.

Alternatively, create new issue for your codesprint idea and tag it with Codespint.

Participants are encouraged to collaborate and comment on proposals.

@ByronCinNZ ByronCinNZ added the Codesprint OGC Metadata Codesprint Sydney 2024 label Sep 26, 2024
@ByronCinNZ
Copy link
Collaborator Author

Proposal - Can Metadata Standards Play Nicely?

The idea is to determine and demonstrate where and when different standards (namely STAC, OGC API records, ISO 19115, and GeoDCAT) can work in tandem to provide the most relevant information to the right users at the right time. The goal is to create a demonstration where the strengths of each are highlighted while working in tandem.

@rob-metalinkage
Copy link
Collaborator

OGC will provide an open source framework for testing mappings between standards, using test case examples covering Records, STAC and DCAT. Codesprint participants would be able to extend this to XML based metadata standards such as ISO 19115, in a way that is extensible to various profiles in use. This could leverage skills in any or all of coding, datamodelling, metadata standards or application domain requirements.

@PeterParslow
Copy link
Contributor

PeterParslow commented Sep 26, 2024

Proposal: what's the future for ISO 19115?

Background: roughly half the ISO/TC 211 members consider that ISO 19115-1:2014 should be revised and yet a lot of areas have not yet moved from the earlier version. There is a lot of discussion e.g. in Europe about moving from ISO 19115:2003 to DCAT instead.

I would like to canvas views on what could be useful for ISO/TC 211 to do, for example:

  • a formal document on how to map ISO 19115-1 to DCAT
  • a new version of ISO 19115-1 restructured using the DCAT terms and structure
  • in both cases, something about aligning the encoding of ISO 19115-1 with one or more encodings of DCAT
  • almost a separate request: some would like to see an RDF/XML and/or OWL "encoding" of ISO 19115-1; how official should that be?

I'm sure I've not thought of all the "possibly useful futures"; I hope the assembled brains will help!

@rob-metalinkage
Copy link
Collaborator

Proposal: what's the future for ISO 19115?

Background: roughly half the ISO/TC 211 members consider that ISO 19115-1:2014 should be revised and yet a lot of areas have not yet moved from the earlier version. There is a lot of discussion e.g. in Europe about moving from ISO 19115:2003 to DCAT instead.

I would like to canvas views on what could be useful for ISO/TC 211 to do, for example:

  • a formal document on how to map ISO 19115-1 to DCAT
  • a new version of ISO 19115-1 restructured using the DCAT terms and structure
  • in both cases, something about aligning the encoding of ISO 19115-1 with one or more encodings of DCAT
  • almost a separate request: some would like to see an RDF/XML and/or OWL "encoding" of ISO 19115-1; how official should that be?

I'm sure I've not thought of all the "possibly useful futures"; I hope the assembled brains will help!

Hi Peter,

in terms of a specific "code sprint" activity, we could look to define and exercise the "mapping" - this could be done several ways and the GeoDCAT Building Blocks can be used to execute, test and publish these mappings. Five options I can immediately see:

  1. XML->JSON using existing libraries and JSON-LD uplift (often this needs an intermediate transform)
  2. 19115 UML -> JSON schema and OWL using Shapechange, and JSON-LD uplift (bonus marks to make shapechencge generate the JSON-LD mapping) - and transforms from 19115 OWL to GeoDCAT.
  3. Use existing transform languages and libraries designed for relational->RDF mapping such as R2RML
  4. custom scripts taking a mapping table and generating or incorporating custom translation code per element
  5. Take a 19115 -> OGC Records mapping and use the Records->DCAT mapping under development

Note that option 4 could be used to generate the transforms for 1,2 or 5. Option 5 will be most OGC API friendly solution.

@PeterParslow
Copy link
Contributor

Thanks Rob; I was thinking more to have a short 'think & talk' break where people just give their views on what they'd like to see improve/change in ISO 19115. The revision project in TC 211 hasn't even started yet, so I we're not ready for any hands on stuff.
Of course if people's ideas of what to change are based on practical work they've done, that's even better. One example could be item 3 - I have heard others say they'd like to see an "official" RDF (even specifically, OWL) representation of ISO 19115. Question: would that actually be more beneficial as an "official" ISO/TC 211 DCAT, then represented in RDF?

The activities you list would all sit in the area of using the current ISO 19115-1:2014 (with accompanying XML spec 19115-3 & soon to have JSON spec 19115-4), or even the old ISO 19115:2003/ISO 19139:2006 (as still used in Europe quite a lot). Paul Jansen is leading that second project, which has involved some of 1 & 2 in your list.

5 looks a useful input to the upcoming ISO 19115-1:2014 revision project.

@dr-shorthair
Copy link
Collaborator

I would suggest skipping OWL and going straight to SHACL.

@christinhenzen
Copy link

Hi all,

We would like to submit a proposal on FAIRness evaluation & GeoDCAT:

Evaluating the FAIRness of geospatial data - Extending the F-UJI tool for GeoDCAT

Evaluating the FAIRness of geospatial data requires specific adoptions of the FAIR principles and related evaluation tools. Taking F-UJI (https://github.com/pangaea-data-publisher/fuji), the well-known open-source evaluation tool, as a software project for our use case, we consider the following GeoDCAT extensions for further brainstorming:

  • Findable: Evaluate spatial information as defined in GeoDCAT, e.g., using the bounding box
  • Accessible: Check if the metadata contains information about a geospatial Web service, e.g., WMS or WFS
  • Interoperable: Evaluate the proper use of geospatial standards/specifications, e.g., CRS
  • Reusability: Evaluate spatial reference system, spatial resolution, and spatial data formats, e.g. GeoJSON

We intend to share the developed extension with the Earth System Sciences community as open-source, facilitating its reuse in geodata portals/catalogues, such as the European data portal or the NFDI4Earth Services Knowledge Hub or OneStop4All.

@akuegeler
Copy link

Proposal: GeoDCAT response for OGC API Records
How about implementing a GeoDCAT response for OGC API Records?
A good place to start should be Rob’s GeoDCAT mapping for OGC API Records. @rob-metalinkage do you think this is doable/a good idea?

We could use pygeoapi's CSW facade as a starting point and transform the ISO19115/ISO19139 metadata from a CSW to GeoDCAT.

Your thoughts on this would be greatly appreciated (especially as we have only limited knowledge on OGC API Records).

@PeterParslow
Copy link
Contributor

PeterParslow commented Oct 24, 2024

Proposal: GeoDCAT response for OGC API Records How about implementing a GeoDCAT response for OGC API Records? A good place to start should be Rob’s GeoDCAT mapping for OGC API Records. @rob-metalinkage do you think this is doable/a good idea?

We could use pygeoapi's CSW facade as a starting point and transform the ISO19115/ISO19139 metadata from a CSW to GeoDCAT.

Your thoughts on this would be greatly appreciated (especially as we have only limited knowledge on OGC API Records).

Does the GeoNetwork (draft?) API Records implementation provide an alternative starting point? It already provides an OGC API Records DCAT output for records that are "natively" stored as ISO 19139 (or presumably other forms). See example (with unattractive GUI layout!) at https://osmetadata.astuntechnology.com/geonetwork/api/collections/main/items (use ?f=dcat or click link)

@rob-metalinkage
Copy link
Collaborator

Review https://docs.ogc.org/bp/17-084r1/17-084r1.html and explore how this overlaps with Records or other options for JSON encoding. (Thanks Uwe)

@PalmJanssen
Copy link

Hi All,

ISO TC 211 - working group 7 - Information Communities, is developing a JSON implementation of the 19115 metadata-standard, the 19115-4, Geographic information — Metadata —
Part 4: JSON schema implementation of metadata fundamentals
.

We are looking for help on creating a GeoJSON schema and want to put that into the Code-sprint

Our approach is the following:

  • 19115-1 and 19157-1 (data quality) are in scope
  • encoding will be in GeoJSON
  • automated generation from uml to GeoJSON schema is possible, but will create an over-complex JSON schema which will be not fit for purpose
  • a comprehensive XML 'real life' dataset is created and converted to a GeoJSON preferred encoding (this work is done)
  • the GeoJSON example dataset is a subset (profile) of 19115-1 and 19157-1, but is considered to cover the essential part
  • the GeoJSON example dataset is considered as the requirements that are set for the GeoJSON schema (to be developed)
  • on the basis of this GeoJSON dataset and the conceptual UML model a GeoJSON schema needs to be generated.
  • a first run of creating a GeoJSON schema from the conceptual UML is available but this needs to be reworked to fit the GeoJSON exmaple.

The goal set and work to be done is reworking the first concept of the GeoJSON schema on the basis of the agreed GeoJSON data example.

What is needed:

  • JSON/GeoJSON expertise
  • 19115 metadata domain knowledge
  • hands on

@rob-metalinkage
Copy link
Collaborator

Hi All,

ISO TC 211 - working group 7 - Information Communities, is developing a JSON implementation of the 19115 metadata-standard, the 19115-4, Geographic information — Metadata — Part 4: JSON schema implementation of metadata fundamentals.

We are looking for help on creating a GeoJSON schema and want to put that into the Code-sprint

Our approach is the following:

  • 19115-1 and 19157-1 (data quality) are in scope
  • encoding will be in GeoJSON
  • automated generation from uml to GeoJSON schema is possible, but will create an over-complex JSON schema which will be not fit for purpose
  • a comprehensive XML 'real life' dataset is created and converted to a GeoJSON preferred encoding (this work is done)
  • the GeoJSON example dataset is a subset (profile) of 19115-1 and 19157-1, but is considered to cover the essential part
  • the GeoJSON example dataset is considered as the requirements that are set for the GeoJSON schema (to be developed)
  • on the basis of this GeoJSON dataset and the conceptual UML model a GeoJSON schema needs to be generated.
  • a first run of creating a GeoJSON schema from the conceptual UML is available but this needs to be reworked to fit the GeoJSON exmaple.

The goal set and work to be done is reworking the first concept of the GeoJSON schema on the basis of the agreed GeoJSON data example.

What is needed:

  • JSON/GeoJSON expertise
  • 19115 metadata domain knowledge
  • hands on

Note I have moved this to a new issue to allow targetted discussions.

@uvoges
Copy link

uvoges commented Oct 28, 2024

A valuable input for a specification of a JSON(+LD) Encoding of the ISO metadata could be the OGC BP https://docs.ogc.org/bp/17-084r1/17-084r1.html
Here we defined a GeoJSON and JSON-LD encoding for metadata for Earth Observation (EO) collections (dataset series).It mainly encodes ISO19115 with some extensions for the Earth Observation Domain e.g. Satellite/Platform, Instrument,... which were integrated from ISO19115-2.

A convergence of the proposed JSON-LD mapping with GeoDCAT-AP [OR15] was a further intention.

The GeoJSON encoding is defined as a compaction through a normative context, of the proposed JSON-LD encoding, with some extensions.

JSON is human readable and easily parseable. However, the BP is based on a normative JSON-LD context which allows each property to be explicitly defined as a URI. Furthermore, the JSON encoding is defined using JSON Schema which allows validation of instances against these schemas.
The specification defines a GeoJSON-based serialization syntax for Earth Observation Collection Metadata that conforms to a subset of [NR13] syntax constraints but does not require JSON-LD processing

JSON-LD context /compaction
While expansion removes context from a given input, compaction’s primary function is to perform the opposite operation: to express a given input according to a particular context. Compaction applies a context that specifically tailors the way information is expressed for a particular person or application. This simplifies applications that consume JSON or JSON-LD by expressing the data in application-specific terms, and it makes the data easier to read by humans [OR4].
JSON-LD uses the special @context property to define the processing context. The value of the @context property is defined by the JSON-LD specification. Implementations producing EO Collection metadata documents should include a @context property with a value that includes a reference to the normative JSON-LD @context definition here using the URL “https://www.opengis.net/spec/eoc-geojson/1.0
JSON-LD 1.1 [OR21] aware clients can apply a JSON-LD 1.1 @context to interpret the GeoJSON encoding as JSON-LD.
{
"@context": #http://bp.schemas.opengis.net/17-084r1/eoc-geojson/1.0/eoc-geojson.jsonld""
}
A context is a set of rules for interpreting a JSON-LD document as specified in the section “The Context” of the JSON-LD specification [NR13]
While expansion removes context from a given input, compaction’s primary function is to perform the opposite operation: to express a given input according to a particular context. Compaction applies a context that specifically tailors the way information is expressed for a particular person or application. This simplifies applications that consume JSON or JSON-LD by expressing the data in application-specific terms, and it makes the data easier to read by humans [OR4].The proposed context can be found here: http://bp.schemas.opengis.net/17-084r1/eoc-geojson/1.0/eoc-geojson.jsonld"

[NR13]A JSON-based Serialisation for Linked Data, W3C Recommendation, 2014, http://www.w3.org/TR/json-ld/
[OR4][4] JSON-LD 1.0 Processing Algorithms and API, W3C Recommendation 16 January 2014, https://www.w3.org/TR/json-ld-api/
[OR15]: GeoDCAT-AP: A geospatial extension to DCAT application profiles for data portals in Europe, Version 1.0.1, 02/08/2016, https://semiceu.github.io/GeoDCAT-AP/releases/1.0.1/geodcat-ap_1.0.1.pdf

for more details, examples etc refer to https://docs.ogc.org/bp/17-084r1/17-084r1.html

You can also experiment (compaction/expansion,...) with the BP examples in the json-ld playground: https://json-ld.org/playground/

@rob-metalinkage
Copy link
Collaborator

@uvoges I have created a "BuildingBlock" using these resources - refactoring a little bit to use more modern schemas and reference existing standard BuildingBlocks.. https://ogcincubator.github.io/iso19115-json/bblock/ogc.metacat.iso19115.bp17-084

I think we could easily refactor further to use standard provenance building blocks, push other components into reusable pieces - make the context 19115 element specific to inherit standard contexts and add SHACL validation rules.

@uvoges
Copy link

uvoges commented Oct 29, 2024

Proposal: GeoDCAT response for OGC API Records How about implementing a GeoDCAT response for OGC API Records? A good place to start should be Rob’s GeoDCAT mapping for OGC API Records. @rob-metalinkage do you think this is doable/a good idea?

We could use pygeoapi's CSW facade as a starting point and transform the ISO19115/ISO19139 metadata from a CSW to GeoDCAT.

Your thoughts on this would be greatly appreciated (especially as we have only limited knowledge on OGC API Records).

@uvoges
Copy link

uvoges commented Oct 29, 2024

Proposal: Extend the APIRecords Response to support GeoDCAT as proposed by the API Records-SWG (Peter Vretanos):

Records, on purpose, defines a very light weight "default" model for describing resources to be discovered along the same lines as csw:Record. The idea is that the record is extended as necessary by communities of interest. In order to make it clear how exactly a record has been extended, the model includes a "conformsTo" member that is a place were conformance URIs can be included in the record. This allows clients to determine if the record contains extensions they understand. So, for example, a record extended with GeoDCAT elements would include a URI in the "conformsTo" member to indicate that the record includes GeoDCAT members. GeoDCAT-aware clients can look for this URI (or URIs if there are a number of GeoDCAT conformance classes) and know right away if the GeoDCAT extensions are present.
An added side bonus of this approach is that multiple models can co-exist in a single records thus supporting multiple different catalog clients that understand different models.
So, while we looked at adopting a specific existing model as the "default" model the SWG decided that this more extensible and flexible approach (similar to what was done in CSW with csw:Record) would be more useful.
Suggestion to the GeoDCAT SWG would be this ...
(1) Complete your work on defining the GeoDAT model and, I presume, the JSON encoding for that model.
(2) When that is done, a new part ... OGC API - Records - Part X: GeoDCAT extension ... of the OGC API Records suite of standards can be written that defines the GeoDCAT extension for Records. This would be a very small document simply defining the conformance URIs that should be used in the "conformsTo" member and then pointing to the GeoDCAT specification for the set of additional GeoDCAT properties that should exist in the record. Peter offers to volunteer to write this part when the GeoDCAT SWG is done.

@uvoges
Copy link

uvoges commented Oct 29, 2024

Review https://docs.ogc.org/bp/17-084r1/17-084r1.html and explore how this overlaps with Records or other options for JSON encoding. (Thanks Uwe).See also:#17 (comment)

@uvoges
Copy link

uvoges commented Oct 29, 2024

Proposal: GeoDCAT response for OGC API Records How about implementing a GeoDCAT response for OGC API Records? A good place to start should be Rob’s GeoDCAT mapping for OGC API Records. @rob-metalinkage do you think this is doable/a good idea?

We could use pygeoapi's CSW facade as a starting point and transform the ISO19115/ISO19139 metadata from a CSW to GeoDCAT.

Your thoughts on this would be greatly appreciated (especially as we have only limited knowledge on OGC API Records).

see also:#17 (comment)

@davidblasby
Copy link

davidblasby commented Nov 15, 2024

Proposal: Geonetwork 5 - OGCAPI-Records and with DCAT-AP/GeoDCAT (country specific APs)

Myself, Jose Garcia, and Jeroen Ticheler have been working with François Prunayre on making BIG improvements to Geonetwork's OGCAPI-records implementation (see recently merged pull requests). François Prunayre has recently improved Geonetwork 4's DCAT-AP/GeoDCAT (country specific APs) and the next step is to incorporate these into Geonetwork 5 as well incorporate the new OGCAPI-Records implementation.

We will be working in the EU (UTC+1) as well as West Coast Canada (UTC-8) Time Zone. We hope to be attending most of the main meetings.

image

See tickets for the work outlined in the diagram, above.

GeoNetwork 5 board: https://github.com/orgs/geonetwork/projects/4
In particular:

@rob-metalinkage
Copy link
Collaborator

Great. Can you provide details yet on how you handle multiple specific APs and what we might focus on to make these machine actionable for configuration perhaps?

@davidblasby
Copy link

davidblasby commented Nov 15, 2024

Great. Can you provide details yet on how you handle multiple specific APs and what we might focus on to make these machine actionable for configuration perhaps?

Hi, Rob,

The plan is to move the Geonetwork 4 FormatterApi as well as all its support over to Geonetwork 5. The core of this is a set of XSLTs that take the underlying ISO XML documents and output DCAT-AP (country specific) XML RDF files. This is already working in Geonetwork 4 (with Francois' very recent changes). This will give the OGCAPI-records infrastructure access to all of Geonetwork's output formats (more than just DCAT).

Right now choosing the output format selects a specific XSLT that will produce the AP-specific DCAT output (i.e. DCAT-AP, GeoDCAT-AP, DCAT-AP-Mobility, or DCAT-AP-HVD). This XSLT takes ISO19115-3 to the DCAT*. For ISO19139 documents, a conversion from ISO19139 to ISO19115-3 is done first.

This is quite a bit work, so I expect it will not be completed next week.

@ticheler
Copy link

Right now choosing the output format selects a specific XSLT that will produce the AP-specific DCAT output (i.e. DCAT-AP, GeoDCAT-AP, DCAT-AP-Mobility, or DCAT-AP-HVD). This XSLT takes ISO19115-3 to the DCAT*. For ISO19193 documents, a conversion a ISO19193 to ISO19115-3 conversion is done first.

Note: ISO19193 => ISO19139

@rob-metalinkage
Copy link
Collaborator

@avillar can we have rdf/xml examples validated with current bb pipeline? What about xslt transforms?

If we do this then we can define various profiles in SHACL and also test Records and native DCAt examples.

@PeterParslow
Copy link
Contributor

Right now choosing the output format selects a specific XSLT that will produce the AP-specific DCAT output (i.e. DCAT-AP, GeoDCAT-AP, DCAT-AP-Mobility, or DCAT-AP-HVD). This XSLT takes ISO19115-3 to the DCAT*. For ISO19193 documents, a conversion a ISO19193 to ISO19115-3 conversion is done first.

Note: ISO19193 => ISO19139
@ticheler : those things will be really useful for a probable imminent formal ISO/TC 211 document mapping ISO 19115-1:2014 to GeoDCAT - it will be at the conceptual level but having an input that works at the encoding level (from ISO 19115-3) will be great. I am tasked to draft the scope of the project after this code sprint, followed by a ballot to start the project in the new year. I'd like to contact you direct about contributing....

@brunslo
Copy link

brunslo commented Nov 18, 2024

Proposal: Data Format and Validation Service For Interoperability Between Analytics and Data

As systems become more complicated, and a system of systems approach becomes the norm rather than the dream, it's important that datasets and containerised analytics to be able declare, trust, and verify that data meets certain assertions and standards. At AURIN we've been throwing around an idea for building at a Data Format and Validation Registry, first internally and then externally, that would mix persistent identifiers, human readable data assertions, and machine readable and executable assertions on data formats to add an automated "trust, with verification" layer to our data/analytics ecosystem.

The basic sketch -- which is as far as we've progressed it -- would be to have a minted persistent identifier (similar to a DOI or an ORCID) of some description that links to a human readable and machine readable page describing what this data format is and what assertions data would have to meet to be considered valid. A back of the envelope service could have something like a standard DOI type link that would resolve to a page with human readable metadata about the data format and then validation scripts or attached assertions in a validation language like Great Expectations or something similar.

The challenge, beyond the obvious of vetting whether this is a good approach in the first place, would be to think about what metadata should be provided on the other end of that "format persistent identifier" link that would allow for a service to independently and automatically verify that a dataset complies with the standard it claims to adhere to.

Lot to unpack there, and not a lot of direction, but hopefully useful grist for the mill!

@ticheler
Copy link

ticheler commented Nov 18, 2024

There is an extensive mapping that you can find here.
An explanation of how this works can be found here

We are also thinking of using a facility like this to validate the different DCAT-AP outputs.

@rob-metalinkage
Copy link
Collaborator

Hi all,

We would like to submit a proposal on FAIRness evaluation & GeoDCAT:

Evaluating the FAIRness of geospatial data - Extending the F-UJI tool for GeoDCAT

Evaluating the FAIRness of geospatial data requires specific adoptions of the FAIR principles and related evaluation tools. Taking F-UJI (https://github.com/pangaea-data-publisher/fuji), the well-known open-source evaluation tool, as a software project for our use case, we consider the following GeoDCAT extensions for further brainstorming:

  • Findable: Evaluate spatial information as defined in GeoDCAT, e.g., using the bounding box
  • Accessible: Check if the metadata contains information about a geospatial Web service, e.g., WMS or WFS
  • Interoperable: Evaluate the proper use of geospatial standards/specifications, e.g., CRS
  • Reusability: Evaluate spatial reference system, spatial resolution, and spatial data formats, e.g. GeoJSON

We intend to share the developed extension with the Earth System Sciences community as open-source, facilitating its reuse in geodata portals/catalogues, such as the European data portal or the NFDI4Earth Services Knowledge Hub or OneStop4All.

this Use Case now tracked in #25

@ticheler
Copy link

@rob-metalinkage How could we integrate such a check in GeoNetwork? Is there some end point to submit a record and retrieve the resulting report?

@rob-metalinkage
Copy link
Collaborator

rob-metalinkage commented Nov 20, 2024

@rob-metalinkage How could we integrate such a check in GeoNetwork? Is there some end point to submit a record and retrieve the resulting report?

This crosses a whole bunch of implementation concerns - including infrastructure etc. IMHO the first step is to have registries of profiles (of standards) for different application domains, and focus on machine-readability of these. Hence the OGC Building Blocks - but we need to articulate the patterns better - please join in the report writing where we will try to outline whats common here.

I would think Geonetwork should embed profile validation - using cachable resources from "fine grained" profiles - so you can validate parts of resources quickly and elegantly in user interactions if you choose. We will be adding XSLT transforms into the Building Block toolkits so we can exploit SHACL validation rules - and could include XSD validation possibly. The utility of XSD alone seems low IMHO, as so much is often dependent on what you put into the schema - not whether you follow the structure..

So, lets look at it from the perspective of ISO and OGC - what can we do to make Geonetwork users lives easier? Whats the goal and what are the first steps we can take?

@nmtoken
Copy link

nmtoken commented Nov 21, 2024

. ..
An explanation of how this works can be found here

No equivalent field in ISO (eg. Where to store spdx:checksum in ISO?)

Could you use the describes element at the end of the ISO XML record:

  <describes>
    <gmx:MX_DataSet>
      <has>spdx:checksum</has>
      <gmx:dataFile>
        <gmx:MX_DataFile>
          <gmx:fileName></gmx:fileName>
          <gmx:fileDescription></gmx:fileDescription>
          <gmx:fileType></gmx:fileType>
          <gmx:fileFormat></gmx:fileFormat>
        </gmx:MX_DataFile>
      </gmx:dataFile>
    </gmx:MX_DataSet>
  </describes>
</gmd:MD_Metadata>

@PeterParslow
Copy link
Contributor

spdx:checksum

If the idea is to validate the file which is at the end of a gcx:Anchor, I'd see that as more of a limitation of the xlink namespace - it could probably do with a checksum attribute.

if it's about validating a file at the end of one of the URLs that are more specifically part of ISO 19115 (e.g. a citation or online resource) then I guess that is a "limitation" of ISO 19115-1: it didn't imagine also having to cope with information security/integrity.

@nmtoken
Copy link

nmtoken commented Nov 21, 2024

For xlink, could possibly use xlink:role to point to a URI for the spdx:checksumAlgorithm and use xlink:label for the spdx:checksumValue

@rob-metalinkage
Copy link
Collaborator

For any proposed solutions I'd like to see how you would map it to DCAT - we are adding support for XSLT to the building block register so we can make this more easily found and shared (FAIR)

@nmtoken
Copy link

nmtoken commented Nov 22, 2024

I probably am misunderstanding, but as for example checksum is a property of the DCAT 3 Distribution Class, you would want a mapping in ISO 19115 associated with MD_Distribution?

I took spdx:checksumAlgorithm and spdx:checksumValue from DCAT-AP [1] / EPOS-DCAT AP [2]

[1] https://semiceu.github.io/DCAT-AP/releases/3.0.0/#Checksum
[2] https://epos-eu.github.io/EPOS-DCAT-AP/v3/#properties-for-checksum

@rob-metalinkage
Copy link
Collaborator

@nmtoken - thanks for the profile example. Will be useful looking at GeoDCAT scope. We need some 19115 XML officianados to provide XML examples and profile descriptions.

For example I dont see how the mapping here works: https://wiki.earthdata.nasa.gov/pages/viewpage.action?pageId=6226303

/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:identifier/gmd:MD_Identifier/gmd:code/gco:CharacterString

seems to be just smuggling it into a generic identifier...

@PeterParslow
Copy link
Contributor

seems to be just smuggling it into a generic identifier...

A very philosophical question - would a checksum be a good identifier? It's likely (but not guaranteed!) to be unique, but it's hardly likely to be actually used to retrieve the data object (although I guess if a system designer chose to make the checksum the identifier, then it could).

Anyway, if NASA DataFileContainer.Checksum is their identifier for the dataset, then that seems a reasonable place. But you would need a separate MD_DataIdentification for each file (distribution) - very different from many people's view of a dataset having many distributions (e.g. formats).

I agree with Rob that isn't what I'd see as a generic mapping.

Note: I've only really considered mapping from ISO 19115 to DCAT, not the other way round....

@nmtoken
Copy link

nmtoken commented Nov 25, 2024

I actually missed that DCAT 3 also has a Checksum Class (https://www.w3.org/TR/vocab-dcat-3/#Class:Checksum) which isn't associated with the Distribution Class. I assume that leaves it open as to how to map to ISO 19115, if you wanted to do that exercise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Codesprint OGC Metadata Codesprint Sydney 2024
Projects
None yet
Development

No branches or pull requests