Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: separate max edge distance from interaction radius throughout #504

Merged
merged 4 commits into from
Nov 17, 2023

Conversation

DaniBodor
Copy link
Collaborator

@DaniBodor DaniBodor commented Sep 22, 2023

I think this one is also best reviewed one commit at a time, even though it's only 3 commits.
The final commit is not really part of this issue, but I went through the entire code and fixed/simplified/added type hinting throughout the code base. I can move it to a separate PR if you think it's better, but I had done it in between this one and #506, which builds on this. That is why it is currently here.

Changes:

  • Clarify distance_cutoff vs radius across different Query classes (see here)
    • Rename the parameters to avoid confusion
      • radius becomes influence_radius = the distance from the structure of interest (variant residue or any interface residue) to look for residues to include
      • cutoff becomes max_edge_length = the maximum distance between two nodes to include an edge between them.
    • Add interaction_radius parameter for PPI; previously only SRVs were using separate paramenters, while PPI was using cutoff to mean both things.
    • Remove edge_distance_cutoff as a Graph attribute, because it serves no purpose.
  • Decided against allowing for setting these to 0 to mean infinity/include all
    • It's quite a bit of effort, especially with adding tests (which would also be slow).
    • It's very rare that someone should want to do this, so we should not make it "easy" to do so for users who don't know what they're doing.
    • If someone really wants to include everything, they can just be set to an extremely high value and it will do the same.

closes #460

@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch from 3d83096 to 6003cce Compare September 22, 2023 14:22
@DaniBodor DaniBodor mentioned this pull request Sep 22, 2023
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch from ceb1d52 to 95e8b82 Compare September 22, 2023 22:40
@DaniBodor DaniBodor marked this pull request as draft September 22, 2023 22:41
@DaniBodor DaniBodor force-pushed the 480_new branch 2 times, most recently from a87e798 to 4c0bdfc Compare September 23, 2023 11:29
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch 2 times, most recently from 9bd4cab to 84a09ce Compare September 23, 2023 11:33
@DaniBodor DaniBodor linked an issue Sep 23, 2023 that may be closed by this pull request
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch 2 times, most recently from d2bebeb to 702f1ed Compare September 23, 2023 15:16
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch from 1ba344d to 8c710bc Compare September 23, 2023 18:07
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch from 8c710bc to a1ccfc1 Compare September 23, 2023 18:23
@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This PR is stale because it has been open for 14 days with no activity.

@github-actions github-actions bot added the stale issue not touched from too much time label Oct 9, 2023
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch 2 times, most recently from 29edf9a to 27e3746 Compare October 10, 2023 07:54
@DaniBodor DaniBodor force-pushed the 480_new branch 4 times, most recently from df6477b to c2151e7 Compare November 3, 2023 14:54
@DaniBodor DaniBodor force-pushed the 460_radius_vs_edgedistance_dbodor branch 3 times, most recently from 2e8dc67 to 6df8f63 Compare November 3, 2023 16:54
Base automatically changed from 480_new to dev November 7, 2023 10:30
@DaniBodor DaniBodor marked this pull request as ready for review November 7, 2023 10:31
@DaniBodor DaniBodor requested a review from gcroci2 November 7, 2023 10:38
@DaniBodor DaniBodor removed the stale issue not touched from too much time label Nov 7, 2023
@gcroci2 gcroci2 requested a review from cbaakman November 8, 2023 17:32
@gcroci2
Copy link
Collaborator

gcroci2 commented Nov 8, 2023

I've asked @cbaakman's review as well, only to verify that this PR doesn't break some logic that was meant for the old cutoffs. No need to go through the code in-depth, a conceptual confirmation would be more than enough :)

targets (dict[str, float]) = Name(s) (key) and target value(s) (value) associated with this query.
interaction_radius (float | None): all residues within this radius from the variant residue or interacting interfaces
will be included in the graph, irrespective of the chain they are on.
max_edge_distance (float | None): the maximum distance between two nodes to generate an edge connecting them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest max_nodes_distance (maximum distance between nodes for having an edge) or max_distance_edge (same but making nodes implicit) to avoid confusion from the naming

Copy link
Collaborator Author

@DaniBodor DaniBodor Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that edge distance is not the correct terminology. How about something relating to edge length instead? Like max_edge_length or edge_length_cutoff or cutoff_edge_length?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit tough because "length" and "distance" both have a different meaning than this in graph theory

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_edge_length


Args:
pdb_path (str): The path of the pdb file, that the structure was built from.
structure (:class:`PDBStructure`): From which to take the residues.
chain_id1 (str): First protein chain identifier.
chain_id2 (str): Second protein chain identifier.
distance_cutoff (float): Max distance between two interacting residues.
interaction_radius (float): Maximum distance between residues to consider them as interacting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since edges represent interactions, then isn't interaction_radius conceptually the same thing as max_edge_distance? Maybe the radius doesn't have to be necessarily about interactions (thus could be radius only), while edges should always be about interactions. Otherwise, what's the conceptual difference between these two variables?

Copy link
Collaborator Author

@DaniBodor DaniBodor Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edges don't necessarily equate to interactions. I think mostly this will be the case, and by default I would use the same value (and we do), but there could be situations where interaction_radius is higher than max_edge_distance. This could be the case 1) purely to decrease computational power/time, or 2) e.g. because the "secondary" interactions can be relevant. Even if two nodes don't have an edge between them, an intermediate node could be affected by both edges.
I can't off the top of my head think of a situation where max_edge_distance would be larger than interaction_radius, and am not sure that would even make sense to do. Maybe we can create a waning/error/cap in case the user sets cutoff > radius. What do you think?

Note that in interaction_radius I am not necessarily talking about protein-protein-interactions (trans interactions), but could also be cis-interactions (so residues of the same protein interacting with each other). So also in a variant, if residues are too far away, they would not interact with/affect the variant residue so don't need to be considered.

I agree that the terminology is confusing, though. At the same time, I find radius too generic, it doesn't really describe what the parameter is either. Let's see if we can come up with another nomenclature. Maybe something along the lines of sphere_of_influence, although that sounds too fuzzy to me as well. Maybe influence_radius?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, "radius" also has a specific meaning in graph theory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

influence_radius

Copy link
Collaborator

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, almost there :) I left few comments we need to reflect on before proceeding with the merging. Let me know what do you think about them

@DaniBodor
Copy link
Collaborator Author

So we just need to agree on a nomenclature here, and then I can merge this PR.

My newest suggestions are influence_radius and either max_edge_length or edge_length_cutoff (see below for they represent). What do you think?

radius becomes interaction_radius = the distance from the structure of interest (variant residue or interface) to look for residues to include
cutoff becomes max_edge_distance = the maximum distance between two nodes to include an edge between them.

@gcroci2
Copy link
Collaborator

gcroci2 commented Nov 17, 2023

So we just need to agree on a nomenclature here, and then I can merge this PR.

My newest suggestions are influence_radius and either max_edge_length or edge_length_cutoff (see below for they represent). What do you think?

radius becomes interaction_radius = the distance from the structure of interest (variant residue or interface) to look for residues to include
cutoff becomes max_edge_distance = the maximum distance between two nodes to include an edge between them.

Indeed, so as agreed I'd go for max_edge_length and influence_radius

deeprank2/query.py Outdated Show resolved Hide resolved
previously `max_edge_distance` and `interaction_radius`, respectively
@DaniBodor DaniBodor merged commit e92ae10 into dev Nov 17, 2023
7 checks passed
@DaniBodor DaniBodor deleted the 460_radius_vs_edgedistance_dbodor branch November 17, 2023 22:58
@gcroci2 gcroci2 added the SS label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify distance_cutoff across different Query classes
2 participants