Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine how to represent specimens as OWL restrictions #61

Open
gaurav opened this issue Feb 27, 2019 · 4 comments
Open

Determine how to represent specimens as OWL restrictions #61

gaurav opened this issue Feb 27, 2019 · 4 comments
Labels
manuscript Issue requiring summarization in manuscript

Comments

@gaurav
Copy link
Member

gaurav commented Feb 27, 2019

In Model 2.0, we represent scientific name-based taxonomic units as an OWL restriction in the form:

phyloref:includes_TU some (tc:hasName some (ICZN_Name and dwc:scientificName value "scientific name"))

I think a phyloreference that includes a TU represented by a single specimen counts as a single dwc:Organism. In that case, we could say it:

phyloref:includes_TU some (dwc:organismID value "specimen voucher number")

Unfortunately, we don't have a lot of examples of clade definitions that use specimen identifiers as taxonomic units. The best ones we've seen are in Fisher et al, 2007, which defines a few clade definitions that use specimens as specifiers:

Phyloreference Specifier Identified as Note
Leucophanes Wall 2527, Fiji (uc) Exodictyon incrassatum (Mitt.) Cardot Only external specifier for this definition
Exostratum Mishler 7/24/98(3), Queensland, Australia (uc) Exostratum blumii (Nees ex Hampe) L.T. Ellis The entire genus of Exostratum is an additional internal specifier
Arthrocormus Mishler 7/24/98 (5) Queensland, Australia (UC) Arthrocormus schimperi Dozy & Molk The species A. schimperi is listed separately as an internal specifiers

Note that the specimen-based specifiers are completely redundant with scientific-name-based specifiers is two out of the three cases, and none of these specifiers use globally unique identifiers.

I propose that we use dwc:organismID for now, but possibly re-evaluate this once we have more phyloreferences with specimen identifiers to look at.

@gaurav
Copy link
Member Author

gaurav commented Feb 27, 2019

A related term is TaxonConcept:circumscribedBy, which is used to indicate a "specimen that forms part of the circumscription of this taxon". Therefore, we could instead state that a phyloreference:

phyloref:includes_TU some (TaxonConcept:circumscribedBy some (dwc:organismID value "specimen voucher number"))

I'm not sure we gain anything with this additional complexity, however.

@gaurav
Copy link
Member Author

gaurav commented Jun 10, 2019

I did a quick survey of how other RDF resources record specimen identifiers. Most use separate fields for dwc:collectionID and dwc:catalogNumber rather than a single field that combines both pieces of information. Specimens either have an rdf:type of dsw:Specimen (e.g. Phenoscape) or of dwc:Occurrence with a dwc:basisOfRecord (e.g. GBIF's Beginner's Guide to Persistent Identifiers, BiSciCol Triplifier with example).

When combining these fields into a single field for a specimen identifier, dwc:occurrenceID appears to be the correct place to put the Darwin Core Triple rather than dwc:organismID -- for example, iDigBio uses the latter to store a dataset-specific identifier in this example while VertNet uses occurrenceID but not organismID.

It therefore looks like we should choose between:

phyloref:includes_TU some (dwc:occurrenceID value "urn:catalog:[institutionID]:[collectionID]:[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)

or

phyloref:includes_TU some (dwc:institutionID value "[institutionID]" and dwc:collectionID value "[collectionID]" and dwc:catalogNumber value "[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)

I prefer the dwc:occurrenceID approach since it is more compact and easier to read. We could also add support for using a URI in that field.

Semantic Darwin Core adds a new type of dsw:Token -- a token is derived from a dwc:Organism and is evidence for an dwc:Occurrence. I don't think we need this extra layer of complexity.

@hlapp
Copy link
Member

hlapp commented Jun 11, 2019

How does OpenBioDiv do this? I don't recall off the top of my head, but it's certainly worth checking.

@gaurav
Copy link
Member Author

gaurav commented Jun 12, 2019

OpenBioDiv appears to defer to Darwin-SW when it comes to encoding occurrences (see pensoft/OpenBiodiv#14 or the OpenBioDiv-O paper). I couldn't find an example of encoded occurrence data in its Github repository: the closest I could find was the use of dwcFP:hasOccurrenceID to record a dataset-specific occurrence ID (example).

Looking through the OpenBioDiv repository reminded me that the TDWG Ontology used to have a Specimen OWL class with a specimenID property, but that class (along with its planned successor, TaxonOccurrence) have been deprecated since 2015.

I also found an entry in the Darwin Core RDF Guide that recommends the use of dwc:basisOfRecord/institutionCode/collectionCode/catalogNumber.

@hlapp hlapp added the manuscript Issue requiring summarization in manuscript label Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
manuscript Issue requiring summarization in manuscript
Projects
None yet
Development

No branches or pull requests

2 participants