Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple annotations/associations from the main release obo/owl files #13

Open
cmungall opened this issue Feb 1, 2024 · 1 comment
Open
Labels
help wanted Extra attention is needed

Comments

@cmungall
Copy link

cmungall commented Feb 1, 2024

Current pyobo includes annotations (in the sense of GO annotations, not OWL annotations) modeled as relationships (i.e S subClassOf R some O).

An example of this is ec.obo:

[Term]
id: eccode:1.1.1.1
name: alcohol dehydrogenase
is_a: eccode:1.1.1 ! With NAD(+) or NADP(+) as acceptor
relationship: RO:0002327 GO:0004022 ! enables alcohol dehydrogenase (NAD+) activity
relationship: RO:0002351 uniprot:A0A0H2URT2 ! has member ADHE_STRPN
relationship: RO:0002351 uniprot:A0A0H2ZM56 ! has member ADHE_STRP2
[many rows deleted]

This has a number of practical and semantic disadvantages

  1. It bloats the size (ec.obo is 14x bigger with relationships)
  2. Danger of ontological errors (real: the composed products will simply not work in OWL environments unless everything is modeled just so)
  3. Lack of modularity / Harder to recompose into application-specific products (e.g. what if I want EC + just human proteins)
  4. product becomes stale sooner
  5. lack of separation of concerns
  6. For associations it's important to have evidence, provenance. While this can be done with ontology formats using axiom annotation, this can get bulky and awkward. A TSV is simpler and better often
  7. Directionality issues (are links to EC distributed with uniprot? links to uniprot distributed with EC? both?)
  8. Shoreline issues (ec.obo includes all swissprot annotations, but not, say an arguably more useful set like reference proteomes for core species. Why?)
  9. It's broadly understood that distributing annotations and "contingent knowledge" in the ontology and in models like OWL is not a good strategy, see e.g https://doi.org/10.1016/j.yjbinx.2019.100002. See also slides 51 onwards

Instead decouple the associations / annotations / contingent knowledge. Use TSVs without OWL semantics and all its pitfalls. KGX is a good choice. Some associations are better modeled as SSSOM. By all means distribute these as .obo/.owl as well, and by all means distribute merged products too. The key is to focus on the "conceptual coat hanger" as Rector calls it, and allow people to hang their coats as they please.

In practical terms something like this:

This is less work for pyobo/obo-db-ingest overall. Sometimes you can simply say "we are only providing the coat rack today, we may get to the associations later"

@cmungall
Copy link
Author

cmungall commented Nov 2, 2024

This is still a major impediment to reusing the fantastic work in obo-db-ingest.

E.g. here is the latest rhea ingest

image

cthoyt referenced this issue in biopragmatics/pyobo Nov 4, 2024
@cthoyt cthoyt transferred this issue from biopragmatics/pyobo Nov 5, 2024
cthoyt added a commit to biopragmatics/pyobo that referenced this issue Nov 5, 2024
References biopragmatics/obo-db-ingest#13

Demonstration of results are in
biopragmatics/obo-db-ingest#12

This PR enables serializing to OBO but skipping object properties, as
requested by @cmungall
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants