Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinate with SIB on conversion of swisslipids #140

Open
cmungall opened this issue Feb 14, 2023 · 10 comments
Open

coordinate with SIB on conversion of swisslipids #140

cmungall opened this issue Feb 14, 2023 · 10 comments

Comments

@cmungall
Copy link

cmungall commented Feb 14, 2023

I am sitting next to @JervenBolleman, he is showing me his conversion of swisslipids to obo/owl based on https://beta.sparql.swisslipids.org/. It would be great if we can agree on a canonical serialization

  • agree on use of prefix. SLM vs swisslipids to SWISSLIPIDs.
    • SWISSLIPIDS as this is consistent with the bioregistry entry and use of caps for ontology prefixes.
    • SIB currently uses SLM and this is canonical on their documentation and in their URIs e.g. https://swisslipids.org/rdf/SLM_000000002 (not yet resolvable)
    • PyOBO uses swisslipids
  • how are subspecies axiomatized? subClassOf?
  • connecting equivalent IDs.
    • pyobo uses xref (annoyingly with non-interoperable chebi:NNN rather than CHEBI:NNN
    • SIB's uses owl:equivalentClass.
    • This latter is more correct but a lot of frameworks don't do a good job with EC axioms
    • note robot relax will turn an equivalent classes axioms between named classes as reciprocal subClassOf. This is confusing. But not doing the relax is equally confusing. See
    • See for example this query
    • a radical option would be to merge, but I think this is where OBO standards (one unique representation of a concept) and database autonomy may conflict. merge may be best left downstream (but we would definitely do this for use in OBO ontologies like OBA, cc @rays22
  • which vocabulary should be used for annotation properties, e.g ?
    • current CHEBI hash vocabular?
    • chemrof - my choice
    • cheminf?
    • this should obviously be synced with what CHEBI should use in the future
  • level in hierarchy
    • we need a broad standard for this in OBO, every ontology does this differently (PR, NCBITaxon) or don't do this at all when they should (CHEBI)
    • subsets? (a la NCBITaxon)
    • comments? (a la PR)
    • an explicit enum/vocab of categories, a la chemrof?
    • ad-hoc annotation property plus string?

cc @dosumis

@dosumis
Copy link

dosumis commented Feb 14, 2023

CC @rays22

@JervenBolleman
Copy link

This is of interest to us SwissLipids as well.

@cthoyt
Copy link
Member

cthoyt commented Feb 21, 2023

@cmungall @JervenBolleman is there any possibility the SIB can host me for a week or two to work on this coordination / we can work together to get project funding for this? Otherwise, asking to change all of the useful practicalities of PyOBO to align externally is a pretty big ask. I like the idea, though

@JervenBolleman
Copy link

@cthoyt let's talk about this at biocuration. In the meantime maybe @cmungall can introduce us email wise.

@rays22 rays22 moved this from Todo Next to Requested in @rays22's triage and progress Feb 23, 2023
@JervenBolleman
Copy link

JervenBolleman commented Feb 27, 2023

I just wanted to add an example of what comes out of ROBOT convert of the Swiss-Lipids.rdf

OBO

[Term]
id: SLM:000003492
name: 1-(21Z,24Z,27Z,30Z-hexatriacontatetraenoyl)-2-tetradecanoyl-sn-glycero-3-phospho-L-serine
is_a: SLM:000000336 ! 1,2-diacyl-sn-glycero-3-phospho-L-serine
is_a: SLM:000114461 ! Phosphatidylserine (36:4/14:0)
relationship: BFO:0000051 SLM:000000825 ! tetradecanoate
relationship: BFO:0000051 SLM:000001232 ! (21Z,24Z,27Z,30Z)-hexatriacontatetraenoate
property_value: altLabel PS(36:4(21Z,24Z,27Z,30Z)/14:0) xsd:string
property_value: CHEMINF:000412 SLM:000003492 xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/charge "-1" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/formula "C56H101NO10P" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/inchi "InChI=1S/C56H102NO10P/c1-3-5-7-9-11-13-
15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-38-39-41-43-45-47-54(58)64-49-52(
50-65-68(62,63)66-51-53(57)56(60)61)67-55(59)48-46-44-42-40-37-14-12-10-8-6-4-2/h11,13,16-17,19-20,
22-23,52-53H,3-10,12,14-15,18,21,24-51,57H2,1-2H3,(H,60,61)(H,62,63)/p-1/b13-11-,17-16-,20-19-,23-2
2-/t52-,53+/m1/s1" xsd:string
property_value: http://purl.obolibrary.org/obo/chebi/inchikey "GASNTXNDIBBTJQ-JRWRWSKCSA-M" xsd:str
ing
property_value: http://purl.obolibrary.org/obo/chebi/smiles "CCCCCCCCCCCCCC(=O)O[C@H](COC(=O)CCCCCC
CCCCCCCCCCCCC\\C=C/C\\C=C/C\\C=C/C\\C=C/CCCCC)COP([O-])(=O)OC[C@H]([NH3+])C([O-])=O" xsd:string
property_value: seeAlso https://rdf.metanetx.org/chem/MNXM253867
property_value: SLM:hasPart SLM:000000825
property_value: SLM:hasPart SLM:000001232
property_value: SLM:rank https://swisslipids.org/rdf/SLM_Isomeric_Subspecies

RDF

SLM:000003492 a owl:Class ;
  SLid: 'SLM:000003492' ;
  SLM:rank SLM:Isomeric_Subspecies ;
  rdfs:label "1-(21Z,24Z,27Z,30Z-hexatriacontatetraenoyl)-2-tetradecanoyl-sn-glycero-3-phospho-L-serine" ; 
  skos:altLabel "PS(36:4(21Z,24Z,27Z,30Z)/14:0)" ; 
  rdfs:subClassOf SLM:000000336 ;
  rdfs:subClassOf 
SLM:000114461 ;
  chebislash:inchi "InChI=1S/C56H102NO10P/c1-3-5-7-9-11-13-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-38-39-41-43-45-47-54(58)64-49-52(50-65-68(62,63)66-51-53(57)56(60)61)67-55(59)48-46-44-42-40-37-14-12-10-8-6-4-2/h11,13,16-17,19-20,22-23,52-53H,3-10,12,14-15,18,21,24-51,57H2,1-2H3,(H,60,61)(H,62,63)/p-1/b13-11-,17-16-,20-19-,23-22-/t52-,53+/m1/s1" ; 
  chebislash:inchikey "GASNTXNDIBBTJQ-JRWRWSKCSA-M" ; 
  rdfs:seeAlso metanetx:MNXM253867 ;
  chebislash:charge "-1" ; 
  rdfs:subClassOf [ 
   a owl:Restriction ;
   owl:onProperty haspart: ;
   owl:someValuesFrom SLM:000001232 ] ;
  rdfs:subClassOf [ 
   a owl:Restriction ;
   owl:onProperty haspart: ;
   owl:someValuesFrom SLM:000000825 ] ;

 SLM:hasPart SLM:000001232 ,
    SLM:000000825 ;
  chebislash:smiles '''CCCCCCCCCCCCCC(=O)O[C@H](COC(=O)CCCCCCCCCCCCCCCCCCC\\C=C/C\\C=C/C\\C=C/C\\C=C/CCCCC)COP([O-])(=O)OC[C@H]([NH3+])C([O-])=O''' ; 
  chebislash:formula "C56H101NO10P" .

I believe that it should be possible for the OBO to be nicer to look at.

SwissLipids is a proper extension of ChEBI so the OBO has a bunch of stanza's like this

[Term]
id: CHEBI:78102 ! 1-tetradecyl-sn-glycero-3-phosphocholine
equivalent_to: SLM:000001362 ! 1-O-tetradecyl-sn-glycero-3-phosphocholine

@dosumis
Copy link

dosumis commented Mar 2, 2023

Hi all - I think all we need need is

(a) a reliable source of a SwissLipids ontology file
(b) stable IRIs (@cthoyt & @JervenBolleman - do you think it would be possible to at least agree on short_form IDs?)
(c) A class hierarchy that links to CHEBI (which shouldn't be hard given that "SwissLipids is a proper extension of ChEBI"). CHEBI IRIs should follow OBO standard.

All other details can evolve without breaking our use case.

It looks like the SwissLipids release might already do all of this (perhaps apart from CHEBI IDs?). Maybe PyOBO can, but I haven't seen the SwissLipids ontology product from PyOBO yet. @cthoyt would you be able to post a link or a recipe for generating?

If we can get agreement on (b) we could switch between pyOBO or swisslipds versions if needed.

@cthoyt
Copy link
Member

cthoyt commented Mar 2, 2023

Conversion code: https://github.com/pyobo/pyobo/blob/main/src/pyobo/sources/slm.py
Artifacts: https://github.com/biopragmatics/obo-db-ingest/tree/main/export/swisslipid

PyOBO will always follow the Bioregistry standard, so if you want to talk about changing the prefix we can do a discussion on the tracker there https://github.com/biopragmatics/bioregistry/issues

@cmungall
Copy link
Author

cmungall commented Mar 2, 2023

I think we just need to agree on the ID prefix and then the official swisslipids file satisfies @dosumis criteria (there are other things it would be good to iterate on, as per my original comment in this ticket, but this can come later).

Related to the ID discussion: should the ontology artefact be registered on OBO? Given that this is an extension to CHEBI and follows the same structure it seems reasonable. This might require having an obolibrary base to the PURLs, which may not be desirable to SIB (although there are some exceptions in OBO).

@JervenBolleman
Copy link

JervenBolleman commented Mar 2, 2023

Using the swisslipids beta sparql endpoint and robot

curl -L -H 'accept:text/turtle' 'https://beta.sparql.swisslipids.org/sparql/' \
  --data 'query=PREFIX+foaf%3a+%3chttp%3a%2f%2fxmlns.com%2ffoaf%2f0.1%2f%3e%0d%0aCONSTRUCT+%7b%0d%0a++%3fs+%3fp+%3fo+.%0d%0a%7d+WHERE+%7b%0d%0a++GRAPH+%3chttps%3a%2f%2fsparql.swisslipids.org%2fswisslipids%3e%7b%0d%0a++++%3fs+%3fp+%3fo+.%0d%0a++++FILTER(!sameTerm(%3fp%2c+foaf%3adepiction))%0d%0a%09%7d%0d%0a%7d' \
  -o swisslipids.ttl

robot convert --input swisslipids.ttl  --output swisslipids.obo

We avoid the images as we don't want those in the OBO file. And they are large and that will lead to issues for ROBOT on normal hardware.

@dosumis

a) We are looking into providing the obo and ttl or RDF at a preconverted at a stable location. This will take some time, as going from prototype to production always does.

b) IRI's are easier to agree on than CURIE's. I see no real reason why not, but this would be a bigger change that I would need to discuss with others and gather feedback from SwissLipid users. At this point in time it would require a small postprocessing step of the ROBOT output. e.g.

sed -i 's|SLM:|swisslipid:|g' swisslipids.obo

but this might lead to issues on the obo to owl conversion with ROBOT. So needs investigation.

c) Already the case. See the stanza, which I believe is OWL and OBO correct but unexpected for most obo users.

[Term]
id: CHEBI:78102 ! 1-tetradecyl-sn-glycero-3-phosphocholine
equivalent_to: SLM:000001362 ! 1-O-tetradecyl-sn-glycero-3-phosphocholine

@cmungall Regarding: SwissLipids joining the OBO foundry etc. is a different commitment that I will also need to talk about in the team. Let's move that off this issue.

@matentzn
Copy link

matentzn commented Jun 3, 2023

It seems that we further need to coordinate with a resource called LIPID MAPS which seems to cover some relevant lipids that are not covered by swisslipids..

https://www.lipidmaps.org/resources/sparql

Unfortunately, their SPARQL endpoint is down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants