-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use wikidata to provide skos:definition to owl:Class'es #125
Comments
I have candidate term definitions for ~2K SWEET terms/classes pulled from Earth science glossaries we can sort through. Although, I'm not sure the best way to do that at present. |
Excellent :) @brandonnodnarb where do they exist? Do you have them in electronic format somewhere? At lunch, @dr-shorthair and I were discussing possibly just providing a dct:description (although that would introduce a brand new namespace into SWEET) which is essentially a link to an alternate, maintained description which exists elsewhere e.g. DBPedia, ENVO, .... The keyword here is maintained. I think it would be a bad decision right now for us to go ahead and implement a whole bunch f descriptions which exists solely within SWEET. On the other hand, if they do link to other, better defined, maintained descriptions then it would make sense to link to them. Any comments @brandonnodnarb ? |
What about definitions for terms that are defined in other ontologies (notably ENVO)?There are many ENVO terms that now use the GCW terminology definitions (though that document isn’t published yet). It would be good to reference them directly, rather than reinvent the wheel.
Ruth
…Sent from my iPad
On Jul 16, 2019, at 12:55 PM, Lewis John McGibbney ***@***.***> wrote:
@brandonnodnarb where do they exists? Do you have them in electronic format somewhere?
At lunch, @dr-shorthair and I were discussing possibly just providing a dct:description (although that would introduce a brand new namespace into SWEET) which is essentially a link to an alternate, maintained description which exists elsewhere e.g. DBPedia. The keyword here is maintained. I think it would be a bad decision right now for us to go ahead and implement a whole bunch f descriptions which exists solely within SWEET. On the other hand, if they do link to other, better defined, maintained descriptions then it would make sense to link to them.
Any comments @brandonnodnarb ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I completely agree @rduerr |
@lewismc these are in a spreadsheet. i'll see if I can clean it up and post it somewhere for review. Also, I didn't think of this until you mentioned it, but another option could be to push these things to wikipedia/dbpedia, or make sure they are included and cited (and maintained) there. Hmmm...let me think about this a bit. |
Yes ideally we could even get to this on Thursday as well. I thinl pusing to DBPedia would be an excellent idea. It would be excellent for us to re-use and/or make available as much of this to the wider audience. As this is a pretty large task, the best way may infact be the easiest way e.g. automating pulling comments from DBPedia. An example, very simple SPARQL query can be found as follows
Ofcourse we would merely substitute the subject IRI with whatever term we get from ESIP and then experiment with |
From a usability standpoint, an embedded description is much nicer (because it is there in front of you), and a little more confidence-inducing, because (a) it implies the author of the ontology (SWEET) vouch for it, (b) it is likely to be coherent with the purposes of the ontology, and (c) it is unlikely to drift without explicit reason. Add to that the opportunity for providing definitions that specifically disambiguate the term from its siblings, and fill the term space. Some of these sources may achieve some of these goals. @lewismc, is your particular proposal to find the comments and 'bring them in', or simply to reference them in their original location? If the former, will the process be re-run every few years, or will we freeze this moment in our definitions? Is it worth considering both a local copy of the definition and a reference to the source comment? I'll be OK with whatever approach y'all think is reasonable and achievable. If there is decision-making involved, an ideal presentation would be a Google spreadsheet with the SWEET ontology name, term name, label, and external descriptions from whatever sources we are considering. That would make it easy to review the whole set at once as well as comment or vote on sources for particular terms, should it come to that. |
Hi @graybeal excellent questions, thanks for jumping in. You make some good points which I appreciate.
If we did this, we would be essentially duplicating the content (and it would be appropriate to use of one rdfs:comment, skos:definition or dct:description). This is not to say that the things being represented are equals but merely that the way the thing is described is identical at that point in time. As you state, the actual literal values (from where they were acquired and where they exist within SWEET) will most likely diverge over time. Is this OK? It may be... but it may not be. More below...
We could also look into using Consider the following Lines 84 to 87 in eb81063
Once the above work was done it would look as follows
If the former, will the process be re-run every few years, or will we freeze this moment in our definitions? I'm not sure about this. We need to think it through. |
A bit confused by this discussion as I am used to ontologies where the definitions are authored by the developers of that ontology (sometimes adapted from an external source, with attribution). Randomly bringing in dictionary definitions could lead to incoherence, and how do we know the definitions reflects the intended meaning? Yes, I have my handy blog post OntoTip entry for text definitions as well: Regardless of who writes them and what pipeline you use, it's super-important to track provenance of definitions, e.g. via axiom annotation |
OK, so who is the developer of SWEET these days? (Presumably the people who are currently maintaining it?) And how does that developer now create appropriate definitions, if not by referencing existing expertise? |
OK, so for the developer question, I do think ENVO's micro-citation is useful. In other words, if I make a change to a term (any change) I annotate the change with my ORCID. I also like using DBXREF's to cite the original source of the definitions. I haven't looked at DBPedia; but perhaps that should be where I dump all the GCW terms and definitions? While I do like having embedded definitions, I really hate the idea of having to update the same definition in more than one place. Would having them in DBPedia help with this problem? Also, I note that all the cryospheric terms and definitions and sources for those can be provided as csv file if that is helpful. Thoughts? |
Yes it would, we would then look at the comment over in the DBPedia resource and determine whether we want a hard mapping. @cmungall thanks for chiming in. I agree with @graybeal here in that the response to your statement
That is essentially us. Raskin et al. never added simple labels or verbose descriptions so it is down to us to annotate and contextualize whatever we feel is necessary. IMHO DBPedia is the best resource I've come across where we can leverage existing knowledge. We can even do this one Class at a time with one pull request. Then every one of the proposed augmentations could be scrutinized. Does this sound logical or is it way off? |
I'm wrestling with implications here, mostly because these external definitions are not versioned, are they? So please pardon my TLDR comments. If we embed (copy) a definition we are then claiming it as our own, and ours won't track any changes to the original source (which may be the best thing); or if it does track changes, we'll have an ongoing monitoring task. In any case yes we'll need to evaluate each one. If we link to definitions that live elsewhere, we still have the monitoring issue (what if that definition changes enough to make it wrong for SWEET?). And if we make the link a hard one (along the lines of sameAs or exactMatch) we are effectively claiming it as our own, and therefore still have to track any changes made to the original to see if we agree. So we're effectively back at the first option. I don't think we can support either of these approaches, even if we could create a great first version. And SWEET is not an authoritative real-world model that can be used for detailed reasoning about the world, and we can't pretend we will be able to come up with all-knowing definitions for these terms. It makes more sense to me to give people pointers to helpful information, and maintain SWEET as a relatively minimalist description of these earth science concepts. So I think it would be best to have the definitions be notional, not authoritative. The relationship would then be 'notionallyDescribedBy', or better words to that effect, and there could be several of them, even with some contradictions between them. This best reflects the real world of SWEET in my opinion. With that approach they could be either embedded (with the definitions sourced in the provenance, and updated automatically from the original content); or referenced remotely (though that makes SWEET less handy to use). I'd prefer the embedded option, where multiple embedded definitions have been pulled from other sources (with date, source citation, and process citation). That follows best practices as far as I'm concerned. |
@pbuttigieg @cmungall Your take on this? |
How about
The range of |
This makes sense to me. |
Ignoring the temptation to comment on the definition :-), I like this. Presumably there can be multiple definitions, which I think is helpful to prevent people from trying to "reason over the definitions" (or argue over the definitions, equally to the point). Good general-purpose definitions are very hard to build, so most aren't that good; the meaning is in the interplay of definitions. In the interest of rigor, can the date be an ISO 8601 date+time+time zone? How does RDFS feel about (read: tolerate) that format? @lewismc What about doing all the pull requests automatically in a branch, then push them all to a Google table (or similar) for each review/comment? (a) You don't want to give someone carpal tunnel approving pull requests, and (b) the likelihood should significantly favor acceptance, with a definition rejected only if there is an agreement it is clearly unacceptable or represents a different concept. (And in the former case, that it's just a poor definition, the disapproval could be represented by annotating the definition, rather than by not including it.) Some system to keep track of the issues and rejections for future updates would be very helpful to minimize future maintenance costs. But treating this as a "handy dandy reference" not as a rigorous definition means reviews could be pretty superficial, just: Is it the right concept or the wrong concept? |
Makes sense. I might go with wikidata rather than dbpedia. I also have code
to do wikidata matching.
See
EnvironmentOntology/envo#833
We should request a SWEET ID property in wikidata see
https://www.wikidata.org/wiki/Property:P3859
…On Thu, Aug 15, 2019 at 3:44 PM John Graybeal ***@***.***> wrote:
Ignoring the temptation to comment on the definition :-), I like this.
Presumably there can be multiple definitions, which I think is helpful to
prevent people from trying to "reason over the definitions" (or argue over
the definitions, equally to the point). Good general-purpose definitions
are very hard to build, so most aren't that good; the meaning is in the
interplay of definitions.
In the interest of rigor, can the date be an ISO 8601 date+time+time zone?
How does RDFS feel about (read: tolerate) that format?
@lewismc <https://github.com/lewismc> What about doing all the pull
requests automatically in a branch, then push them all to a Google table
(or similar) for each review/comment? (a) You don't want to give someone
carpal tunnel approving pull requests, and (b) the likelihood should
significantly favor acceptance, with a definition rejected only if there is
an agreement it is clearly unacceptable or represents a different concept.
(And in the former case, that it's just a poor definition, the disapproval
could be represented by annotating the definition, rather than by not
including it.) Some system to keep track of the issues and rejections for
future updates would be very helpful to minimize future maintenance costs.
I feel like
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#125?email_source=notifications&email_token=AAAMMONDH5EPSLHP5U45KALQEXL3LA5CNFSM4IEEIF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4NGGYY#issuecomment-521823075>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAMMOJWJBGXK4GC5IPQMZDQEXL3LANCNFSM4IEEIF2A>
.
|
I could do this... however mapping from those determinations back to the correct logic for the pull request cannot be automated. We are talking about thousands of labels for which we are looking to obtain definitions. I completely understand why manual curation is your desire... but once that spreadsheet is populated, then I am left to manually map all of those decisions back into source code... that would take me literally years... |
Maybe a statistical sampling would work? |
Thanks for the suggestions @wdduncan. It's looking more and more like this is not going to be a reproduceable process... which is what I would have preferred. |
Seems like we could put the schema:description content from wikidata in rdfs:comment, and then wouldn't have to import another big namespace. Especially considering that the content of wikidata descriptions is likely to be pretty heterogeneous. If we're going to add definitions, I'd prefer using skos:definition directly or one of the approaches outlined above (#125 (comment), #125 (comment)). |
Reviving this issue following the SemTech meeting today: here is a preliminary pattern for coordinating labels and definitions harvested from multiple external sources - ### http://sweetontology.net/reprDataProduct/Dataset
dprepr:Dataset rdf:type owl:Class ;
rdfs:subClassOf dprepr:DataProduct ;
rdfs:label "dataset"@en ; # Up to here, this is from SWEET, the following is pulled from alternative external sources
skos:definition [
skos:definition "A data set (or dataset, although this spelling is not present in many contemporary dictionaries) is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows."@en ;
skos:prefLabel "Data set"@en ;
dcterms:source <http://dbpedia.org/resource/Data_set> ;
dcterms:created "2019-08-16"^^xsd:date ;
] ;
skos:definition [
skos:definition "collection of data"@en ;
skos:prefLabel "data set"@en ;
dcterms:source <https://www.wikidata.org/wiki/Q1172284> ;
dcterms:created "2021-08-25"^^xsd:date ;
] ;
skos:definition [
skos:definition "A collection of data, published or curated by a single agent, and available for access or download in one or more representations"@en ;
skos:prefLabel "Dataset"@en ;
dcterms:source <https://www.w3.org/TR/vocab-dcat/#Class:Dataset> ;
dcterms:created "2021-08-25"^^xsd:date ;
] ;
. The pattern above is not necessarily the ultimate solution. But it records separate labels and definitions - here embedded in blank-nodes - clearly identified by source, so data formatted this way could be transformed to another pattern with SPARQL queries if another pattern is preferred. @rduerr does this help? |
Yup - absolutely!!! |
@dr-shorthair why would you use dct:created instead of dcterms:created? I thought a best practice was to use dcterms for everything because dct was so, you know … old. |
My mistake. |
@dr-shorthair sorry ... I haven't been able to make the calls in a while. But, I am not sure what you are trying to model by nesting the skos:definitions. Using Protege, you can annotate a class with multiple definitions. The "@" in the upper right hand corner allows you to put annotations on the definitions. Here is a simple demo of what it looks like. If you are interested in the turtle it looks like this:
You can, of course, add other annotation properties like you have in the example above. |
@wdduncan as I hinted in my comment after the example, the exact pattern is not final. And since these are just annotations it is not really important. I was merely suggesting that under the agreed plan to Reimagine SWEET as a compilation of textual definitions, then (i) label (ii) textual definition (iii) source (iv) date are probably the minimum items needed that would support usefully associating each external definition with a SWEET class. I'm not wedded to any particular model. (I do not routinely use Protege, so am not bound to its OWLy view of the world.) |
@dr-shorthair sorry, I didn't catch your drift at the end of the example :) I think what I proposed satisfies your criteria. I understand that not everyone uses Protege, but it is good to stay within the OWLy realm if possible. |
This is basically a reification pattern. The downside is that a label and definition from the same source would appear in separate axioms. They can be linked through having the same source and date, but a small overhead. |
Yes. It is the OWL-reification pattern.
Sorry, I'm not following this point. In the turtle, the axioms are tied to the class via the
|
If you look up at my original proposal, the label and definition from a single source are part of the same object, rather than being in two separate objects. |
Yes. I noticed that. However, I think it better to stick with a well defined standard rather than making up a new way to it. |
So that’s what the Protege @ is for! A decade of using Protege (sporadically) and I never knew… Despite my happiness at learning the above, I think that where a “pure RDF” patter and an OWL pattern can be used to communicate something, the pure RDF pattern should be preferred. Unless the OWL patter is in wide use and this can be demonstrated, which is a step beyond just that the patter is available in “a well defined standard” (OWL). Having said that, it’s a bit cheeky to use |
Well we could stay within the DC world this way ### http://sweetontology.net/reprDataProduct/Dataset
dprepr:Dataset rdf:type owl:Class ;
rdfs:subClassOf dprepr:DataProduct ;
rdfs:label "dataset"@en ; # Up to here, this is from SWEET, the following is pulled from alternative external sources
skos:definition [
dcterms:description "A data set (or dataset, although this spelling is not present in many contemporary dictionaries) is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows."@en ;
dcterms:title "Data set"@en ;
dcterms:source <http://dbpedia.org/resource/Data_set> ;
dcterms:created "2019-08-16"^^xsd:date ;
] ;
skos:definition [
dcterms:description "collection of data"@en ;
dcterms:title "data set"@en ;
dcterms:source <https://www.wikidata.org/wiki/Q1172284> ;
dcterms:created "2021-08-25"^^xsd:date ;
] ;
skos:definition [
dcterms:description "A collection of data, published or curated by a single agent, and available for access or download in one or more representations"@en ;
dcterms:title "Dataset"@en ;
dcterms:source <https://www.w3.org/TR/vocab-dcat/#Class:Dataset> ;
dcterms:created "2021-08-25"^^xsd:date ;
] ;
. But I think @wdduncan is commenting on the fact that the blank-nodes are untyped - just a bag of properties, if you like. Fair enough. OTOH are |
I am not sure how much its used outside the biosciences but it's very widely used in OBO. Many of our ontologies have detailed axiom level provenance. I will hopefully have a blog about it in this series soon https://douroucouli.wordpress.com/2020/09/11/edge-properties-part-1-reification/ |
I'd say there are zero semantics to owlAxiom rather then weak semantics |
@dr-shorthair I understand that what you are proposing is completely valid the SKOS world. The notes in the skos reference state:
However, when users query for definitions, they will receive a complex object instead of text, and will then have to further process the complex object if they are only interested in the text of the definition. In my experience, when I query for definitions, I am looking for (or expecting) text, and I am not interested in the provenance of the defintion. If I am interested in the provenance, then I do a different query. But, perhaps that is only my experience. For what it is worth, when I encounter |
If a SWEET class has a collection of definitions, then the class might represent different concepts (when he definitions are not consistent). If each definition might also have a distinct label, then the SWEET class does not represent a 'word' (lexical item) either. So what does the SWEET class represent? I think under the proposal adopted in #211, that there must only be one prefLabel associated with the SWEET class, making it essentially a dictionary-- a mapping between a lexical item (word) and possible meanings. altLabels might be associated with specific definitions, but there would need to be some clear logic on when the 'label' is different enough that it should be a different SWEET word class. Use of blank nodes for the definitions is also problematic-- what if someone wants to link to a particular definition? |
See discussion https://github.com/ESIPFed/sweet/discussions/259 for a proposal on this topic. As for blank nodes with these definition blocks, anyone want to tackle a proposal for that? I'd like to move forward on this stuff!!! |
If someone can provide me with the proposed data model I can write the
solution. I stepped away from this because we were still unsure about the
data model.
Did we come to a consensual agreement?
Thanks
--
*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney
|
Building on from #20 this issue simply aims to provide rdfs:comment (and/or skos:definition or dct:description) text to all terms.
Open tasks involve us collectively agreeing upon which vocabulary we wish to use e.g. rdfs:comment (and/or skos:definition or dct:description) and additionally whether we manually curate the comments or else automate this by fetching them from wikipedia/dbpedia/dictionary or elsewhere.
Any comments here?
The text was updated successfully, but these errors were encountered: