Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OBO parser] Axiom annotations using arbitrary properties are not always properly translated #1112

Closed
gouttegd opened this issue Oct 14, 2023 · 3 comments
Assignees
Labels

Comments

@gouttegd
Copy link
Contributor

gouttegd commented Oct 14, 2023

Since PR #1099 (present in OWL API 4.5.26), it is supposedly possible to use arbitrary annotation properties in OBO “qualifier tags”, e.g. one can do this:

is_a: TEST:123 {PROP:456="lorem ipsum"}

to annotate the SubClassOf axiom with a http://purl.obolibrary.org/obo/PROP_456 property.

However, this does not always work. For example, in Uberon, the IAO:0000116 tag is most of the times translated correctly as http://purl.obolibrary.org/obo/IAO_0000116, but sometimes translated (incorrectly) as http://www.geneontology.org/formats/oboInOwl#0000116 — which among other consequences makes the ontology impossible to serialise in RDF/XML, because the RDF/XML writer tries to write the annotation as a <oboInOwl:0000116> XML tag (which it cannot do because 0000116 is not a valid XML tag name).

The root cause of the issue lies in the oboIdToIRI_load() method, which I believe does not behave as intended. When called with the oboInOwlDefault argument set to true, it completely ignores the existing prefix and forcibly translate the Curie into the http://www.geneontology.org/formats/oboInOwl# namespace (lines 1729–1731).

That is, oboIdToIRI_load("IAO:0000116", true) will return http://www.geneontology.org/formats/oboInOwl#0000116.

I don’t see how this can be the desired behaviour. The oboInOwlDefault argument should only have an effect in cases where the tag is not qualified (e.g., to translate {source="lorem ipsum"} into http://www.geneontology.org/formats/oboInOwl#source). When the tag includes an explicit prefix (e.g. {IAO:0000116="lorem ipsum"}), the prefix should always be taken into account regardless of the oboInOwlDefault value.

To make things worse, the issue is in many cases hidden by the use of the idToIRICache. When translating any OBO ID to an IRI, that cache is queried before any call to oboIdToIRI_load(). But importantly, the idiToIRICache does not grow indefinitely: when it reaches a certain size, older entries are removed from it.

I believe what happens in the aforementioned Uberon issue is something along the lines of:

  1. The OBO parser processes the editor_note type definition, which contains a cross-reference to IAO:0000116. The ID is translated correctly (the oboIdToIRI_load method is called with oboInOwlDefault set to false in this context) and put into the ID cache. This always happens first, because even though type definitions are physically located at the end of an OBO file, the Typedef frames are always translated before the Term frames.
  2. When processing a Term frame that contains a IAO:0000116="..." qualifier, the cache is queried for the IAO:0000116 ID. At the beginning, the cache will still contain that ID and will therefore return the correct IRI.
  3. But at some point, provided that the ontology is large enough, the IAO:0000116 ID will be dropped from the cache. Then the next time the parser will have to translate a IAO:0000116="..." qualifier, the cache will miss, and oboIdToIRI_load will be called to perform the translation with oboInOwlDefault set to true, thereby resulting in the ID being incorrectly translated into the http://www.geneontology.org/formats/oboInOwl# namespace.

This would explain why only some IAO:0000116 annotations are incorrectly translated. This would also explain why the unit test included in PR #1099 didn’t catch the issue: when the parser needs to translate this bit:

def: "Definition of term two." [] {MYONT:20="A nested annotation value."}

the MYONT:20 ID is still in the ID cache (associated with the correct IRI) because it has been correctly translated when processing the source type definition:

[Typedef]
id: source
name: source
xref: MYONT:20

and the test file is not large enough for the ID to have been removed from the cache.

gouttegd added a commit to gouttegd/owlapi that referenced this issue Oct 15, 2023
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method
so that the oboInOwlDefault parameter is only used when translating
unprefixed IDs (where it instructs to create an IRI in the oboInOwl
namespace instead of the ontology's default namespace). When translating
a prefixed ID, the IRI to construct should always use the URL prefix
dictated by the prefix.

closes owlcs#1112
gouttegd added a commit to gouttegd/owlapi that referenced this issue Oct 15, 2023
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method
so that the oboInOwlDefault parameter is only used when translating
unprefixed IDs (where it instructs to create an IRI in the oboInOwl
namespace instead of the ontology's default namespace). When translating
a prefixed ID, the IRI to construct should always use the URL prefix
dictated by the prefix.

closes owlcs#1112
@balhoff
Copy link
Contributor

balhoff commented Oct 16, 2023

@gouttegd thank you so much for doing this analysis, and sorry for apparently not adequately testing this change before the release! It looks like you're already implementing some fixes. Let me know if I can help. I will be away at a project meeting most of this week, so I probably can't do much over the next few days.

@gouttegd
Copy link
Contributor Author

Given the fact that the ID cache hides the problem away in most cases, I don’t think any amount of testing on small files would have caught the issue. We only found it when using the new ROBOT on a “real world” file (Uberon).

I believe I indeed have a fix ready, but I want to test it intensively before we go through the whole cycle of releasing a new OWL API, a new ROBOT, a new ODK, and a new Protégé.

ignazio1977 pushed a commit that referenced this issue Oct 18, 2023
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method
so that the oboInOwlDefault parameter is only used when translating
unprefixed IDs (where it instructs to create an IRI in the oboInOwl
namespace instead of the ontology's default namespace). When translating
a prefixed ID, the IRI to construct should always use the URL prefix
dictated by the prefix.

closes #1112
@ignazio1977
Copy link
Contributor

@gouttegd thanks for the excellent analysis and the pull request. Keeping this open to update the other branches.

@ignazio1977 ignazio1977 self-assigned this Oct 18, 2023
ignazio1977 pushed a commit that referenced this issue May 7, 2024
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method
so that the oboInOwlDefault parameter is only used when translating
unprefixed IDs (where it instructs to create an IRI in the oboInOwl
namespace instead of the ontology's default namespace). When translating
a prefixed ID, the IRI to construct should always use the URL prefix
dictated by the prefix.

closes #1112

2nd commit message:
Ensure longest namespace match is used. Disallow prefixes that are
substrings of OBO namespace.

3rd commit message:
If a prefix is declared in idspaces, don't use # separator for
"non-canonical" ids.

4th commit message:
Drop conflicting id annotations rather than stuff into owl-axioms
header.

5th commit message:
Corrected key comparison in OBOFormatPrefixManager. Exclude OBO IRIs
from idspaces.

7th commit message:
Don't inject default semweb prefixes into OBO document format.

8th commit message:
Test prefixes aren't injected.
ignazio1977 pushed a commit that referenced this issue Aug 3, 2024
This commit rewrites part of the OWLAPIObo2Owl#oboIdToIRI_load() method
so that the oboInOwlDefault parameter is only used when translating
unprefixed IDs (where it instructs to create an IRI in the oboInOwl
namespace instead of the ontology's default namespace). When translating
a prefixed ID, the IRI to construct should always use the URL prefix
dictated by the prefix.

closes #1112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants