-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112
Comments
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes owlcs#1112
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes owlcs#1112
@gouttegd thank you so much for doing this analysis, and sorry for apparently not adequately testing this change before the release! It looks like you're already implementing some fixes. Let me know if I can help. I will be away at a project meeting most of this week, so I probably can't do much over the next few days. |
Given the fact that the ID cache hides the problem away in most cases, I don’t think any amount of testing on small files would have caught the issue. We only found it when using the new ROBOT on a “real world” file (Uberon). I believe I indeed have a fix ready, but I want to test it intensively before we go through the whole cycle of releasing a new OWL API, a new ROBOT, a new ODK, and a new Protégé. |
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112
@gouttegd thanks for the excellent analysis and the pull request. Keeping this open to update the other branches. |
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112 2nd commit message: Ensure longest namespace match is used. Disallow prefixes that are substrings of OBO namespace. 3rd commit message: If a prefix is declared in idspaces, don't use # separator for "non-canonical" ids. 4th commit message: Drop conflicting id annotations rather than stuff into owl-axioms header. 5th commit message: Corrected key comparison in OBOFormatPrefixManager. Exclude OBO IRIs from idspaces. 7th commit message: Don't inject default semweb prefixes into OBO document format. 8th commit message: Test prefixes aren't injected.
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112
Since PR #1099 (present in OWL API 4.5.26), it is supposedly possible to use arbitrary annotation properties in OBO “qualifier tags”, e.g. one can do this:
to annotate the
SubClassOf
axiom with ahttp://purl.obolibrary.org/obo/PROP_456
property.However, this does not always work. For example, in Uberon, the
IAO:0000116
tag is most of the times translated correctly ashttp://purl.obolibrary.org/obo/IAO_0000116
, but sometimes translated (incorrectly) ashttp://www.geneontology.org/formats/oboInOwl#0000116
— which among other consequences makes the ontology impossible to serialise in RDF/XML, because the RDF/XML writer tries to write the annotation as a<oboInOwl:0000116>
XML tag (which it cannot do because0000116
is not a valid XML tag name).The root cause of the issue lies in the oboIdToIRI_load() method, which I believe does not behave as intended. When called with the
oboInOwlDefault
argument set totrue
, it completely ignores the existing prefix and forcibly translate the Curie into thehttp://www.geneontology.org/formats/oboInOwl#
namespace (lines 1729–1731).That is,
oboIdToIRI_load("IAO:0000116", true)
will returnhttp://www.geneontology.org/formats/oboInOwl#0000116
.I don’t see how this can be the desired behaviour. The
oboInOwlDefault
argument should only have an effect in cases where the tag is not qualified (e.g., to translate{source="lorem ipsum"}
intohttp://www.geneontology.org/formats/oboInOwl#source
). When the tag includes an explicit prefix (e.g.{IAO:0000116="lorem ipsum"}
), the prefix should always be taken into account regardless of theoboInOwlDefault
value.To make things worse, the issue is in many cases hidden by the use of the
idToIRICache
. When translating any OBO ID to an IRI, that cache is queried before any call tooboIdToIRI_load()
. But importantly, theidiToIRICache
does not grow indefinitely: when it reaches a certain size, older entries are removed from it.I believe what happens in the aforementioned Uberon issue is something along the lines of:
editor_note
type definition, which contains a cross-reference toIAO:0000116
. The ID is translated correctly (theoboIdToIRI_load
method is called withoboInOwlDefault
set tofalse
in this context) and put into the ID cache. This always happens first, because even though type definitions are physically located at the end of an OBO file, theTypedef
frames are always translated before theTerm
frames.Term
frame that contains aIAO:0000116="..."
qualifier, the cache is queried for theIAO:0000116
ID. At the beginning, the cache will still contain that ID and will therefore return the correct IRI.IAO:0000116
ID will be dropped from the cache. Then the next time the parser will have to translate aIAO:0000116="..."
qualifier, the cache will miss, andoboIdToIRI_load
will be called to perform the translation withoboInOwlDefault
set totrue
, thereby resulting in the ID being incorrectly translated into thehttp://www.geneontology.org/formats/oboInOwl#
namespace.This would explain why only some
IAO:0000116
annotations are incorrectly translated. This would also explain why the unit test included in PR #1099 didn’t catch the issue: when the parser needs to translate this bit:the
MYONT:20
ID is still in the ID cache (associated with the correct IRI) because it has been correctly translated when processing thesource
type definition:and the test file is not large enough for the ID to have been removed from the cache.
The text was updated successfully, but these errors were encountered: