-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use wikidata to provide skos:definition to owl:Class'es #201
Conversation
I recommend (a) including also additional definitions, and (b) not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions. |
From where?
Which ones are inaccurate, incorrect or unreliable? If you have an example, please point it out. |
It can depend, but like other resources, there can be a diversity of sources. To find out which are inaccurate would involve going through them on wiki, and for those that are specific to a discipline ideally with subject-matter experts. Depending on the concept or term, definitions may come from textbooks, dictionaries, or some other publication. I've seen unique yet similar definitions for a common term, each of which provide insightful information not gleaned by the other. So it's certainly valuable for us to include other sources of def. Some may be more precise, technical, etc. than others. From where the definitions come from, which ones, etc. are all question on the topic of definitions/descriptions. I think that's a topic we should get into. We can also ask if it's clear what the original intention of SWEET was with respect to descriptions of it's concepts--such as is it clear that it was intended to have definitions/desriptions, and/or def for every term, or are there some concepts or terms that should not have a definition (e.g., due to their generality, or variety of senses, or semantic drift, etc.)--and use that as guidance. Just as the structure under skos:definition has rdfs:comment for wikidata descriptions/definitions, so it (or another structure can list more than one rdfs:comment or more than one skos:definition for these descriptions from elsewhere. I think that would be helpful. There are also different types of definition and description that can be asserted, e.g., 'lexical def', 'description of...', etc. |
I agree that we should not declare WikiData as preferred or definitive. And I agree it would be good to have more sources, but: I do not think we should hold up this extremely good change while we wait for someone to generate another set of annotations from another source. Let's not make perfect the enemy of the good. |
@rrovetto I'm unsure what to reply to you. What you've stated seems rather tangential to the contents of this pull request. I am looking for actionable input if you have any. Thanks
Agreed. This issue has already been a long, long time coming. Any review would be appreciated. |
I think this proposed way of doing this is natural and coherent. Of course, I prefer the axiom annotation model used in OBO, but I won't push this further. I would advocate for the principle of DRY: use dc:source or prov:wasDerivedFrom, but not both I'm not totally sure about rdfs:comment to connect the blank node to the definition string. I'm not sure what else to suggest without doing a bit of further research to see what others have done, but I'd advise putting some thought into this. it's not clear to me if you intend to allow>1 def per class (do you intend to use shex/shacl to constrain?). If so I would strongly recommend a mechanism to designate the preferred definition (or restricting to one definition per language, but allowing unlimited alternate descriptions), but my opinions here may be stronger than others. as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings, e.g as in: :IndoorAirQuality rdf:type owl:Class ;
rdfs:subClassOf :AirQuality ;
rdfs:label "indoor air quality"@en ;
skos:definition _:genid6 .
_:genid6 dcterms:created "2020-07-17T10:55:59.639"^^xsd:dateTime ;
dcterms:creator <https://orcid.org/0000-0003-2185-928X> ;
dcterms:source <http://www.wikidata.org/entity/Q905504> ;
rdfs:comment "air quality within and around buildings and structures"@en ;
prov:wasDerivedFrom <http://www.wikidata.org/entity/Q905504> . |
I agree here. I was trying to match what had been implemented in the recent cryospheric work. I would be happy to remove either one... any preferences folks?
At the ESIP meeting in January we discussed only having one
I don't particularly like the way OWLAPI Java wrote the data with blank nodes... is that the issue here? Thanks @cmungall |
my recollection from an exchange, that I thought more recent than that (no idea where, sorry—Semantic Cluster, or GitHub ticket), was that we wanted the flexibility of allowing multiple definitions, so long as they were not seriously contradictory. I think there are good arguments for this, and no convincing counter-arguments that should rule out providing multiple definitions. I don't think the community has weighed in on this. And per my previous comment, I don't think we should delay this change in order to make a final decision on that. In other word, don't preclude additional definitions. If we decide to provide multiple definitions, we can make the decision then about whether we want to consider one of them authoritative. Multiple definitions can be in different languages. If they are from the same source, they will be embedded within the context above. If they are from a different source, they will have their own entry. This seems straightforward and intuitive to me.
These strategies are important if the definition is meant to be normative (or authoritative, if you prefer). I claim (rather vigorously if challenged) these definitions are not and can not be normative/authoritative, they are strictly informative, and therefore have equal weight. As inspection of parallel definitions will quickly establish, there are subtle differences in definitions from different sources that are both informative, while being complementary or subtly contradictory. The subtle contradictions are incredibly value for understanding the concept, and SWEET will never be a system used for heavy reasoning unless it takes on a totally different form. If there are major contradictions among definitions, then someone(s) will have to choose (or annotate, in some cases) one or more to clarify what SWEET means by the concept. As a dictionary of definition sources, SWEET could prove immensely popular.
love how readable that example is! don't know what it means to 'consider' it—just that we start working with that as the standard format in order to gain the readability and diff improvements?
They seem subtly different to me, so my preference is to understand why the cryospheric people used both, then decide. Maybe it was that prov:wasDerivedFrom is a clear statement of provenance (useful), while dc:source feels more like a citation (differently useful). (It seems to me they could be different in some cases, but in your application they won't be, so dc:source feels a bit more precise if you chose one.) |
I wasn't aware of that/can't remember being part of those conversation(s). Maybe this is something we can raise with the Semantic Harmonization cluster?
+1
The way the Turle is written is not configurable afaict. Writing the blank nodes like that is the only way I could find. Again, RE: |
Note that I am a big fan of PROV, but in its place. |
so it sounds like I am submitting a new pull request to 1. Remove
prov:wasDerivedFrom and 2. Retaining use of the dcterms:source
I’ll go ahead and do that... I’ll close this PR and create a new one.
|
This is a new branch which fixes all of the issues identified in #200. Thanks for the feedback @dr-shorthair @smrgeoinfo so far.
I've resolved all of the cryo issues. This PR is ready to be reviewed folks.