-
Notifications
You must be signed in to change notification settings - Fork 2
Use case: use CAM-KP-API to enhance edges #536
Comments
We should probably try to get this to work before #537. The hard part is to find some gene pairs that aren't working but should work, so perhaps what we need is a test file that's a list of genes and then we query them to see if we get the expected relationship. Might be useful to add some exploration endpoints that are easier to work with (e.g. an endpoint that returns a list of models for a particular gene). Question: can we say gene A and gene B are related if they are in the same model? Should we implement that?
"Causes influences" could be the relation between two genes that tells if they are related to each other within a model. This is a broad match of biolink:causes, but we only use exact matches, so that might not be accessible from CAM-KP-API. However, there is a set of manual mappings in https://github.com/ExposuresProvider/cam-pipeline/blob/cc13ef6ac7f4d48e91f77a789c71dec344512e1b/biolink-local.ttl that we might be able to access. |
When testing TRAPI queries, we will need to make sure the RO relation we're inferring maps to a reasonable Biolink relation. Something confusing is that folks may search for |
Here are two different ARAX queries that you can pull gene-chemical edges from, as described on slide 8 in this deck: https://arax.ncats.io/?r=44679 |
These are the two queries from #536 (comment)
Sorry it's taken me so long to respond to this! These queries were super helpful in helping us find and fix some bugs in CAM-KP, and I think there might be more bugs lurking there. Here are my results. As far as I can tell, out of all the edges @karafecho provides to us, only the edge between UniProtKB:P51589 and UniProtKB:P08684 returns results with a one-hop query. This is the following query: {"message":{"query_graph":{"nodes":{"n0":{"ids":["UniProtKB:P51589"]},"n1":{"ids":["UniProtKB:P08684"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"}}}}} Running this on our development instance returns 960 results, all of them being Two-hop queries do a bit better, with:
I used the query: {"message":{"query_graph":{"nodes":{"n0":{"ids":["CHEBI:17996","CHEBI:23114"]},"n1":{},"n2":{"ids":["UniProtKB:P13569"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"},"e1":{"predicates":["biolink:related_to"],"subject":"n1","object":"n2"}}}}} As you can see, UniProtKB:P08684 seems to be quite overrepresented in the results, and again it seems to me that we're seeing a lot more results than I would expect to see here. I wonder if maybe we shouldn't need to do multihop queries to get these results -- whether we should have some So, I think, next steps:
|
Thanks for your work on this, Gaurav. The two-hop results indeed do look interesting, although I have not completed a deep dive. |
Note: updated TCDC workflow can be found in slide 10 in this deck. |
Any updates, Gaurav? Happy to help if you point me in the right direction. |
Hi Kara! My work on this issue currently revolves around the new |
This is all sounds great, Gaurav! I very much appreciate the effort. |
Given an edge, can CAM-KP API provide additional information on that edge, including:
Example: chemical-gene or gene-gene edge
The text was updated successfully, but these errors were encountered: