-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: separate max edge distance from interaction radius throughout #504
Conversation
3d83096
to
6003cce
Compare
ceb1d52
to
95e8b82
Compare
a87e798
to
4c0bdfc
Compare
9bd4cab
to
84a09ce
Compare
d2bebeb
to
702f1ed
Compare
1ba344d
to
8c710bc
Compare
8c710bc
to
a1ccfc1
Compare
This PR is stale because it has been open for 14 days with no activity. |
29edf9a
to
27e3746
Compare
df6477b
to
c2151e7
Compare
2e8dc67
to
6df8f63
Compare
I've asked @cbaakman's review as well, only to verify that this PR doesn't break some logic that was meant for the old cutoffs. No need to go through the code in-depth, a conceptual confirmation would be more than enough :) |
deeprank2/query.py
Outdated
targets (dict[str, float]) = Name(s) (key) and target value(s) (value) associated with this query. | ||
interaction_radius (float | None): all residues within this radius from the variant residue or interacting interfaces | ||
will be included in the graph, irrespective of the chain they are on. | ||
max_edge_distance (float | None): the maximum distance between two nodes to generate an edge connecting them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest max_nodes_distance
(maximum distance between nodes for having an edge) or max_distance_edge
(same but making nodes implicit) to avoid confusion from the naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that edge distance is not the correct terminology. How about something relating to edge length instead? Like max_edge_length
or edge_length_cutoff
or cutoff_edge_length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit tough because "length" and "distance" both have a different meaning than this in graph theory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_edge_length
deeprank2/utils/buildgraph.py
Outdated
|
||
Args: | ||
pdb_path (str): The path of the pdb file, that the structure was built from. | ||
structure (:class:`PDBStructure`): From which to take the residues. | ||
chain_id1 (str): First protein chain identifier. | ||
chain_id2 (str): Second protein chain identifier. | ||
distance_cutoff (float): Max distance between two interacting residues. | ||
interaction_radius (float): Maximum distance between residues to consider them as interacting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since edges represent interactions, then isn't interaction_radius
conceptually the same thing as max_edge_distance
? Maybe the radius doesn't have to be necessarily about interactions (thus could be radius
only), while edges should always be about interactions. Otherwise, what's the conceptual difference between these two variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edges don't necessarily equate to interactions. I think mostly this will be the case, and by default I would use the same value (and we do), but there could be situations where interaction_radius
is higher than max_edge_distance
. This could be the case 1) purely to decrease computational power/time, or 2) e.g. because the "secondary" interactions can be relevant. Even if two nodes don't have an edge between them, an intermediate node could be affected by both edges.
I can't off the top of my head think of a situation where max_edge_distance
would be larger than interaction_radius
, and am not sure that would even make sense to do. Maybe we can create a waning/error/cap in case the user sets cutoff > radius. What do you think?
Note that in interaction_radius
I am not necessarily talking about protein-protein-interactions (trans interactions), but could also be cis-interactions (so residues of the same protein interacting with each other). So also in a variant, if residues are too far away, they would not interact with/affect the variant residue so don't need to be considered.
I agree that the terminology is confusing, though. At the same time, I find radius
too generic, it doesn't really describe what the parameter is either. Let's see if we can come up with another nomenclature. Maybe something along the lines of sphere_of_influence
, although that sounds too fuzzy to me as well. Maybe influence_radius
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, "radius" also has a specific meaning in graph theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
influence_radius
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, almost there :) I left few comments we need to reflect on before proceeding with the merging. Let me know what do you think about them
So we just need to agree on a nomenclature here, and then I can merge this PR. My newest suggestions are
|
Indeed, so as agreed I'd go for |
previously `max_edge_distance` and `interaction_radius`, respectively
I think this one is also best reviewed one commit at a time, even though it's only 3 commits.
The final commit is not really part of this issue, but I went through the entire code and fixed/simplified/added type hinting throughout the code base. I can move it to a separate PR if you think it's better, but I had done it in between this one and #506, which builds on this. That is why it is currently here.
Changes:
distance_cutoff
vsradius
across differentQuery
classes (see here)radius
becomesinfluence_radius
= the distance from the structure of interest (variant residue or any interface residue) to look for residues to includecutoff
becomesmax_edge_length
= the maximum distance between two nodes to include an edge between them.interaction_radius
parameter for PPI; previously only SRVs were using separate paramenters, while PPI was usingcutoff
to mean both things.edge_distance_cutoff
as aGraph
attribute, because it serves no purpose.closes #460