Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Use case: use CAM-KP-API to enhance edges #536

Open
gaurav opened this issue Jul 13, 2022 · 9 comments
Open

Use case: use CAM-KP-API to enhance edges #536

gaurav opened this issue Jul 13, 2022 · 9 comments
Assignees

Comments

@gaurav
Copy link
Member

gaurav commented Jul 13, 2022

Given an edge, can CAM-KP API provide additional information on that edge, including:

  • Which Noctua/Reactome pathways includes that edge
  • Where in the body/cell does this pathway take place
  • ...

Example: chemical-gene or gene-gene edge

@gaurav gaurav self-assigned this Jul 13, 2022
@gaurav
Copy link
Member Author

gaurav commented Jul 14, 2022

We should probably try to get this to work before #537.

The hard part is to find some gene pairs that aren't working but should work, so perhaps what we need is a test file that's a list of genes and then we query them to see if we get the expected relationship.

Might be useful to add some exploration endpoints that are easier to work with (e.g. an endpoint that returns a list of models for a particular gene).

Question: can we say gene A and gene B are related if they are in the same model? Should we implement that?

  • Since we don't have that, we need to find specific relations for this task.

"Causes influences" could be the relation between two genes that tells if they are related to each other within a model. This is a broad match of biolink:causes, but we only use exact matches, so that might not be accessible from CAM-KP-API. However, there is a set of manual mappings in https://github.com/ExposuresProvider/cam-pipeline/blob/cc13ef6ac7f4d48e91f77a789c71dec344512e1b/biolink-local.ttl that we might be able to access.

@balhoff
Copy link
Contributor

balhoff commented Jul 14, 2022

When testing TRAPI queries, we will need to make sure the RO relation we're inferring maps to a reasonable Biolink relation. Something confusing is that folks may search for causes but some relevant relations map to affects.

@karafecho
Copy link

Here are two different ARAX queries that you can pull gene-chemical edges from, as described on slide 8 in this deck:

https://arax.ncats.io/?r=44679
https://arax.ncats.io/?r=52713

gaurav added a commit that referenced this issue Sep 23, 2022
@gaurav
Copy link
Member Author

gaurav commented Oct 5, 2022

Sorry it's taken me so long to respond to this! These queries were super helpful in helping us find and fix some bugs in CAM-KP, and I think there might be more bugs lurking there. Here are my results.

As far as I can tell, out of all the edges @karafecho provides to us, only the edge between UniProtKB:P51589 and UniProtKB:P08684 returns results with a one-hop query. This is the following query:

{"message":{"query_graph":{"nodes":{"n0":{"ids":["UniProtKB:P51589"]},"n1":{"ids":["UniProtKB:P08684"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"}}}}}

Running this on our development instance returns 960 results, all of them being biolink:affects_activity_of edges from the model http://model.geneontology.org/R-HSA-5423646. I'm not sure why there are so many results, but I'm going to dig into this further to see what's going on here.

Two-hop queries do a bit better, with:

  • 360 results for CHEBI:34477-(?)-UniProtKB:P08684
  • 144 results for CHEBI:63840-(?)-UniProtKB:P08684
    • This has some interesting results, e.g. CHEBI:63840("5'-hydroxyomeprazole") biolink:participates_in GO:0006739 ("NADP metabolic process") biolink:caused_by NCBIGene:100861540
  • 1000+ results for (CHEBI:17996 or CHEBI:23114)-(?)-UniProtKB:P13569
  • 1000+ results for UniProtKB:O75795-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:P16662-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:P19224-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:P22310-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:P54855-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:P24462-(?)-UniProtKB:P08684
  • 1000+ results for UniProtKB:Q9HB55-(?)-UniProtKB:P08684
  • 1000+ results for CHEBI:35703-(?)-UniProtKB:P08684

I used the query:

{"message":{"query_graph":{"nodes":{"n0":{"ids":["CHEBI:17996","CHEBI:23114"]},"n1":{},"n2":{"ids":["UniProtKB:P13569"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"},"e1":{"predicates":["biolink:related_to"],"subject":"n1","object":"n2"}}}}}

As you can see, UniProtKB:P08684 seems to be quite overrepresented in the results, and again it seems to me that we're seeing a lot more results than I would expect to see here.

I wonder if maybe we shouldn't need to do multihop queries to get these results -- whether we should have some related_to triples connecting entities that have any relation with each other.

So, I think, next steps:

  • Dig into the one-hop results and figure out what's going on there.
  • Dig into the first two two-hop result sets, figure out if there's anything interesting in there, and if we should change our triplestore so that you can get these results with a one-hop query.

@karafecho
Copy link

Thanks for your work on this, Gaurav.

The two-hop results indeed do look interesting, although I have not completed a deep dive.

@karafecho
Copy link

Note: updated TCDC workflow can be found in slide 10 in this deck.

@karafecho
Copy link

Any updates, Gaurav? Happy to help if you point me in the right direction.

@gaurav
Copy link
Member Author

gaurav commented Nov 2, 2022

Hi Kara! My work on this issue currently revolves around the new /lookup endpoint (#572): my goal is to have an endpoint that (1) normalizes input identifiers and (2) goes around the main SPARQL query we are currently using to query the triplestore directly to return everything we know about a particular identifier, in order to check whether the main SPARQL query is working correctly. This is primarily intended for debugging right now, but once that's done, I want to provide the ability to filter by an object as well -- so we should have an API endpoint that would allow you to query e.g. /lookup?subject=CHEBI:17685&object=GO:0019136&hopLimit=10 to find every relation between CHEBI:17685 and GO:0019136 across up to ten hops after normalizing both of those identifiers. I think that'll give us everything we need to enhance edges and double-check our SPARQL queries at the same time. I've gotten sidetracked by some database issues, but I'm hoping to have the basic /lookup endpoint up by early next week, with support for filtering by an object identifier added soon thereafter. Happy to discuss any of this in a meeting if that would be useful!

@karafecho
Copy link

This is all sounds great, Gaurav! I very much appreciate the effort.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants