[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

gouttegd · 2023-10-14T23:17:58Z

Since PR #1099 (present in OWL API 4.5.26), it is supposedly possible to use arbitrary annotation properties in OBO “qualifier tags”, e.g. one can do this:

is_a: TEST:123 {PROP:456="lorem ipsum"}

to annotate the SubClassOf axiom with a http://purl.obolibrary.org/obo/PROP_456 property.

However, this does not always work. For example, in Uberon, the IAO:0000116 tag is most of the times translated correctly as http://purl.obolibrary.org/obo/IAO_0000116, but sometimes translated (incorrectly) as http://www.geneontology.org/formats/oboInOwl#0000116 — which among other consequences makes the ontology impossible to serialise in RDF/XML, because the RDF/XML writer tries to write the annotation as a <oboInOwl:0000116> XML tag (which it cannot do because 0000116 is not a valid XML tag name).

The root cause of the issue lies in the oboIdToIRI_load() method, which I believe does not behave as intended. When called with the oboInOwlDefault argument set to true, it completely ignores the existing prefix and forcibly translate the Curie into the http://www.geneontology.org/formats/oboInOwl# namespace (lines 1729–1731).

That is, oboIdToIRI_load("IAO:0000116", true) will return http://www.geneontology.org/formats/oboInOwl#0000116.

I don’t see how this can be the desired behaviour. The oboInOwlDefault argument should only have an effect in cases where the tag is not qualified (e.g., to translate {source="lorem ipsum"} into http://www.geneontology.org/formats/oboInOwl#source). When the tag includes an explicit prefix (e.g. {IAO:0000116="lorem ipsum"}), the prefix should always be taken into account regardless of the oboInOwlDefault value.

To make things worse, the issue is in many cases hidden by the use of the idToIRICache. When translating any OBO ID to an IRI, that cache is queried before any call to oboIdToIRI_load(). But importantly, the idiToIRICache does not grow indefinitely: when it reaches a certain size, older entries are removed from it.

I believe what happens in the aforementioned Uberon issue is something along the lines of:

The OBO parser processes the editor_note type definition, which contains a cross-reference to IAO:0000116. The ID is translated correctly (the oboIdToIRI_load method is called with oboInOwlDefault set to false in this context) and put into the ID cache. This always happens first, because even though type definitions are physically located at the end of an OBO file, the Typedef frames are always translated before the Term frames.
When processing a Term frame that contains a IAO:0000116="..." qualifier, the cache is queried for the IAO:0000116 ID. At the beginning, the cache will still contain that ID and will therefore return the correct IRI.
But at some point, provided that the ontology is large enough, the IAO:0000116 ID will be dropped from the cache. Then the next time the parser will have to translate a IAO:0000116="..." qualifier, the cache will miss, and oboIdToIRI_load will be called to perform the translation with oboInOwlDefault set to true, thereby resulting in the ID being incorrectly translated into the http://www.geneontology.org/formats/oboInOwl# namespace.

This would explain why only some IAO:0000116 annotations are incorrectly translated. This would also explain why the unit test included in PR #1099 didn’t catch the issue: when the parser needs to translate this bit:

def: "Definition of term two." [] {MYONT:20="A nested annotation value."}

the MYONT:20 ID is still in the ID cache (associated with the correct IRI) because it has been correctly translated when processing the source type definition:

[Typedef]
id: source
name: source
xref: MYONT:20

and the test file is not large enough for the ID to have been removed from the cache.

The text was updated successfully, but these errors were encountered:

This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes owlcs#1112

balhoff · 2023-10-16T14:02:33Z

@gouttegd thank you so much for doing this analysis, and sorry for apparently not adequately testing this change before the release! It looks like you're already implementing some fixes. Let me know if I can help. I will be away at a project meeting most of this week, so I probably can't do much over the next few days.

gouttegd · 2023-10-16T15:41:13Z

Given the fact that the ID cache hides the problem away in most cases, I don’t think any amount of testing on small files would have caught the issue. We only found it when using the new ROBOT on a “real world” file (Uberon).

I believe I indeed have a fix ready, but I want to test it intensively before we go through the whole cycle of releasing a new OWL API, a new ROBOT, a new ODK, and a new Protégé.

This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112

ignazio1977 · 2023-10-18T21:30:01Z

@gouttegd thanks for the excellent analysis and the pull request. Keeping this open to update the other branches.

This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112 2nd commit message: Ensure longest namespace match is used. Disallow prefixes that are substrings of OBO namespace. 3rd commit message: If a prefix is declared in idspaces, don't use # separator for "non-canonical" ids. 4th commit message: Drop conflicting id annotations rather than stuff into owl-axioms header. 5th commit message: Corrected key comparison in OBOFormatPrefixManager. Exclude OBO IRIs from idspaces. 7th commit message: Don't inject default semweb prefixes into OBO document format. 8th commit message: Test prefixes aren't injected.

This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method so that the oboInOwlDefault parameter is only used when translating unprefixed IDs (where it instructs to create an IRI in the oboInOwl namespace instead of the ontology's default namespace). When translating a prefixed ID, the IRI to construct should always use the URL prefix dictated by the prefix. closes #1112

gouttegd mentioned this issue Oct 17, 2023

Fix forced use of oboInOwl namespace when translating OBO IDs. #1113

Merged

ignazio1977 self-assigned this Oct 18, 2023

ignazio1977 added the bug label Oct 18, 2023

ignazio1977 closed this as completed in 6bb2d3a Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

gouttegd commented Oct 14, 2023 •

edited

Loading

balhoff commented Oct 16, 2023

gouttegd commented Oct 16, 2023

ignazio1977 commented Oct 18, 2023

[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

Comments

gouttegd commented Oct 14, 2023 • edited Loading

balhoff commented Oct 16, 2023

gouttegd commented Oct 16, 2023

ignazio1977 commented Oct 18, 2023

gouttegd commented Oct 14, 2023 •

edited

Loading