Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge edges from different KPs by primary_knowledge_source #2381

Open
amykglen opened this issue Sep 18, 2024 · 6 comments
Open

Merge edges from different KPs by primary_knowledge_source #2381

amykglen opened this issue Sep 18, 2024 · 6 comments
Assignees

Comments

@amykglen
Copy link
Member

discussed in today's AHM

right now Expand does not merge edges from different KPs

but there is some duplicated information between KPs, which merging based on primary_knowledge_source may help eliminate

it may not be perfect (e.g., the clinical trials KP may list different primary sources vs. what KG2 lists for its ingested CTKP edges), but it should at least be an improvement! and we could refine the tricky merging over time...

@amykglen
Copy link
Member Author

also see #1951

@sundareswarpullela sundareswarpullela self-assigned this Oct 15, 2024
@amykglen amykglen changed the title Merge edges from different KPs by primary_knowledge_source? Merge edges from different KPs by primary_knowledge_source Oct 15, 2024
@sundareswarpullela
Copy link
Collaborator

Based on the preliminary exploration it looks like the edges are not being merged due the the subject or object of the edges not getting the preferred curie in the edge key.

@amykglen
Copy link
Member Author

@sundareswarpullela - if you remember from the other day when we explored this, the edge keys ARAX assigns do use the preferred curies, but include the KP name instead of the primary knowledge source (which is why they're not being merged between KPs)

@sundareswarpullela
Copy link
Collaborator

sundareswarpullela commented Oct 31, 2024

In the Example 1 acetaminophen test query:
The preferred object CURIE of the first edge and the second edge are the same, but in the edge key, the object CURIEs are different, causing the edges to not get merged. Am I mistaken here @amykglen, or could this be a separate issue and I'm conflating the two?

Edge 1:
Object CURIE in edge key: UniProtKB:P05177

Screenshot 2024-10-31 at 9 41 32 AM

Edge 2:
Object CURIE in edge key: NCBIGene:1544 (also the preferred CURIE)
Screenshot 2024-10-31 at 9 41 55 AM

@amykglen
Copy link
Member Author

ah, I see, ok. yes, you're right! thanks for the examples. for some reason I thought we had determined that the final edge keys were being assigned after canonicalization, but I guess that must not be true. interesting.

though it still is true that you'll also need to stop including the KP name in the edge keys (and instead include the primary KS). so both of those things will need to be addressed here.

@amykglen
Copy link
Member Author

hey @sundareswarpullela - thanks for all the work on this! I was playing around with it on /test and had a couple questions. in comparing the same result to the CI version (without merging on primary KS), I noticed that when edges are merged it seems like only some of their sources and attributes are retained. for instance, for the PTGS1 result for our acetaminophen example query on CI, there are two physically_interacts_with edges from bindingdb - one from SPOKE and one from service provider:

here's the edge from SPOKE:
Screenshot 2024-11-20 at 8 57 19 AM

and here's the edge from service provider:
Screenshot 2024-11-20 at 8 58 09 AM

on /test, there is indeed only one such edge from bindingdb (yay), but the merged edge for bindingdb appears to only contain the sources and attributes from one of those KPs (SPOKE):

Screenshot 2024-11-20 at 8 59 00 AM

the sources from the service provider edge as well as the publications attribute from service provider are missing. maybe this is already on your agenda, but I think we'll want to merge those on the merged edge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants