Use wikidata to provide skos:definition to owl:Class'es #201

lewismc · 2020-07-17T19:05:05Z

This is a new branch which fixes all of the issues identified in #200. Thanks for the feedback @dr-shorthair @smrgeoinfo so far.

I've resolved all of the cryo issues. This PR is ready to be reviewed folks.

rrovetto · 2020-07-17T19:55:16Z

I recommend (a) including also additional definitions, and (b) not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

lewismc · 2020-07-17T20:15:45Z

@rrovetto

including also additional definitions

From where?

not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

Which ones are inaccurate, incorrect or unreliable? If you have an example, please point it out.

rrovetto · 2020-07-17T21:36:03Z

It can depend, but like other resources, there can be a diversity of sources. To find out which are inaccurate would involve going through them on wiki, and for those that are specific to a discipline ideally with subject-matter experts.

Depending on the concept or term, definitions may come from textbooks, dictionaries, or some other publication. I've seen unique yet similar definitions for a common term, each of which provide insightful information not gleaned by the other. So it's certainly valuable for us to include other sources of def. Some may be more precise, technical, etc. than others.
We can also create definitions, have subject-matter experts provide input on subject matter concepts, etc.

From where the definitions come from, which ones, etc. are all question on the topic of definitions/descriptions. I think that's a topic we should get into. We can also ask if it's clear what the original intention of SWEET was with respect to descriptions of it's concepts--such as is it clear that it was intended to have definitions/desriptions, and/or def for every term, or are there some concepts or terms that should not have a definition (e.g., due to their generality, or variety of senses, or semantic drift, etc.)--and use that as guidance.

Just as the structure under skos:definition has rdfs:comment for wikidata descriptions/definitions, so it (or another structure can list more than one rdfs:comment or more than one skos:definition for these descriptions from elsewhere. I think that would be helpful.

There are also different types of definition and description that can be asserted, e.g., 'lexical def', 'description of...', etc.

graybeal · 2020-07-17T21:37:17Z

I recommend (a) including also additional definitions, and (b) not declaring the WikiData ones as the preferred or definitive one, in part because Wiki may have inaccurate, incorrect or unreliable definitions.

I agree that we should not declare WikiData as preferred or definitive.

And I agree it would be good to have more sources, but:

I do not think we should hold up this extremely good change while we wait for someone to generate another set of annotations from another source. Let's not make perfect the enemy of the good.

lewismc · 2020-07-17T22:15:14Z

@rrovetto I'm unsure what to reply to you. What you've stated seems rather tangential to the contents of this pull request. I am looking for actionable input if you have any. Thanks

@graybeal

I do not think we should hold up ...

Agreed. This issue has already been a long, long time coming. Any review would be appreciated.

cmungall · 2020-07-17T23:18:52Z

I think this proposed way of doing this is natural and coherent. Of course, I prefer the axiom annotation model used in OBO, but I won't push this further.

I would advocate for the principle of DRY: use dc:source or prov:wasDerivedFrom, but not both

I'm not totally sure about rdfs:comment to connect the blank node to the definition string. I'm not sure what else to suggest without doing a bit of further research to see what others have done, but I'd advise putting some thought into this.

it's not clear to me if you intend to allow>1 def per class (do you intend to use shex/shacl to constrain?). If so I would strongly recommend a mechanism to designate the preferred definition (or restricting to one definition per language, but allowing unlimited alternate descriptions), but my opinions here may be stronger than others.

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings, e.g as in:

:IndoorAirQuality rdf:type owl:Class ;
                  rdfs:subClassOf :AirQuality ;
                  rdfs:label "indoor air quality"@en ;
                  skos:definition _:genid6 .

_:genid6 dcterms:created "2020-07-17T10:55:59.639"^^xsd:dateTime ;
          dcterms:creator <https://orcid.org/0000-0003-2185-928X> ;
          dcterms:source <http://www.wikidata.org/entity/Q905504> ;
          rdfs:comment "air quality within and around buildings and structures"@en ;
          prov:wasDerivedFrom <http://www.wikidata.org/entity/Q905504> .

lewismc · 2020-07-17T23:42:08Z

I would advocate for the principle of DRY: use dc:source or prov:wasDerivedFrom, but not both

I agree here. I was trying to match what had been implemented in the recent cryospheric work. I would be happy to remove either one... any preferences folks?

it's not clear to me if you intend to allow>1 def per class

At the ESIP meeting in January we discussed only having one skos:definition. I am onboard with that,

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings

I don't particularly like the way OWLAPI Java wrote the data with blank nodes... is that the issue here?

Thanks @cmungall

graybeal · 2020-07-18T00:47:59Z

it's not clear to me if you intend to allow>1 def per class

At the ESIP meeting in January we discussed only having one skos:definition. I am onboard with that,

my recollection from an exchange, that I thought more recent than that (no idea where, sorry—Semantic Cluster, or GitHub ticket), was that we wanted the flexibility of allowing multiple definitions, so long as they were not seriously contradictory. I think there are good arguments for this, and no convincing counter-arguments that should rule out providing multiple definitions. I don't think the community has weighed in on this. And per my previous comment, I don't think we should delay this change in order to make a final decision on that.

In other word, don't preclude additional definitions. If we decide to provide multiple definitions, we can make the decision then about whether we want to consider one of them authoritative.

Multiple definitions can be in different languages. If they are from the same source, they will be embedded within the context above. If they are from a different source, they will have their own entry. This seems straightforward and intuitive to me.

If so [multiple definitions] I would strongly recommend a mechanism to designate the preferred definition (or restricting to one definition per language, but allowing unlimited alternate descriptions),

These strategies are important if the definition is meant to be normative (or authoritative, if you prefer). I claim (rather vigorously if challenged) these definitions are not and can not be normative/authoritative, they are strictly informative, and therefore have equal weight.

As inspection of parallel definitions will quickly establish, there are subtle differences in definitions from different sources that are both informative, while being complementary or subtly contradictory. The subtle contradictions are incredibly value for understanding the concept, and SWEET will never be a system used for heavy reasoning unless it takes on a totally different form. If there are major contradictions among definitions, then someone(s) will have to choose (or annotate, in some cases) one or more to clarify what SWEET means by the concept.

As a dictionary of definition sources, SWEET could prove immensely popular.

as an aside you may want to consider a standard turtle serialization to eliminate spurious diffs, and unneccessary blank node renderings, …

love how readable that example is! don't know what it means to 'consider' it—just that we start working with that as the standard format in order to gain the readability and diff improvements?

I was trying to match what had been implemented in the recent cryospheric work. I would be happy to remove either one... any preferences folks?

They seem subtly different to me, so my preference is to understand why the cryospheric people used both, then decide. Maybe it was that prov:wasDerivedFrom is a clear statement of provenance (useful), while dc:source feels more like a citation (differently useful). (It seems to me they could be different in some cases, but in your application they won't be, so dc:source feels a bit more precise if you chose one.)

lewismc · 2020-07-18T02:08:55Z

was that we wanted the flexibility of allowing multiple definitions

I wasn't aware of that/can't remember being part of those conversation(s). Maybe this is something we can raise with the Semantic Harmonization cluster?

I don't think we should delay this change in order to make a final decision on that.

+1

just that we start working with that as the standard format in order to gain the readability and diff improvements?

The way the Turle is written is not configurable afaict. Writing the blank nodes like that is the only way I could find.

Again, RE: dcterms:source instead of prov:wasDerivedFrom in this case, I think that makes sense @graybeal.

dr-shorthair · 2020-07-18T06:08:18Z

I agree with pretty much all that @graybeal wrote above
+1 to using Dublin Core where it has the right semantics, PROV only for the more complex cases.
But note that DCMI recommends the dcterms: namespace in preference to the dc: one - While the /elements/1.1/ namespace will be supported indefinitely, DCMI gently encourages use of the /terms/ namespace.

Note that I am a big fan of PROV, but in its place.

lewismc · 2020-07-18T22:08:23Z

so it sounds like I am submitting a new pull request to 1. Remove prov:wasDerivedFrom and 2. Retaining use of the dcterms:source I’ll go ahead and do that... I’ll close this PR and create a new one.

Use wikidata to provide skos:definition to owl:Class'es

3f6ede0

lewismc requested review from cmungall, charlesvardeman, brandonnodnarb, smrgeoinfo, dr-shorthair, rduerr and pbuttigieg July 17, 2020 19:05

lewismc self-assigned this Jul 17, 2020

lewismc linked an issue Jul 17, 2020 that may be closed by this pull request

Use wikidata to provide skos:definition to owl:Class'es #125

Open

lewismc added this to the 3.6.0 milestone Jul 17, 2020

lewismc added the enhancement label Jul 17, 2020

lewismc mentioned this pull request Jul 17, 2020

comments on SWEET annotation convention #183

Open

lewismc closed this Jul 18, 2020

lewismc deleted the ISSUE-125 branch July 18, 2020 22:53

This was referenced Jul 19, 2020

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #202

Closed

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use wikidata to provide skos:definition to owl:Class'es #201

Use wikidata to provide skos:definition to owl:Class'es #201

lewismc commented Jul 17, 2020 •

edited

Loading

rrovetto commented Jul 17, 2020

lewismc commented Jul 17, 2020 •

edited

Loading

rrovetto commented Jul 17, 2020 •

edited

Loading

graybeal commented Jul 17, 2020

lewismc commented Jul 17, 2020

cmungall commented Jul 17, 2020

lewismc commented Jul 17, 2020

graybeal commented Jul 18, 2020

lewismc commented Jul 18, 2020 •

edited

Loading

dr-shorthair commented Jul 18, 2020 •

edited

Loading

lewismc commented Jul 18, 2020 via email •

edited

Loading

Use wikidata to provide skos:definition to owl:Class'es #201

Use wikidata to provide skos:definition to owl:Class'es #201

Conversation

lewismc commented Jul 17, 2020 • edited Loading

rrovetto commented Jul 17, 2020

lewismc commented Jul 17, 2020 • edited Loading

rrovetto commented Jul 17, 2020 • edited Loading

graybeal commented Jul 17, 2020

lewismc commented Jul 17, 2020

cmungall commented Jul 17, 2020

lewismc commented Jul 17, 2020

graybeal commented Jul 18, 2020

lewismc commented Jul 18, 2020 • edited Loading

dr-shorthair commented Jul 18, 2020 • edited Loading

lewismc commented Jul 18, 2020 via email • edited Loading

lewismc commented Jul 17, 2020 •

edited

Loading

lewismc commented Jul 17, 2020 •

edited

Loading

rrovetto commented Jul 17, 2020 •

edited

Loading

lewismc commented Jul 18, 2020 •

edited

Loading

dr-shorthair commented Jul 18, 2020 •

edited

Loading

lewismc commented Jul 18, 2020 via email •

edited

Loading