Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about benchmark evaluation #14

Open
FarmerTao opened this issue Jan 25, 2025 · 1 comment
Open

Questions about benchmark evaluation #14

FarmerTao opened this issue Jan 25, 2025 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@FarmerTao
Copy link

FarmerTao commented Jan 25, 2025

Great work on the structure motif search algorithm! I have some questions regarding the benchmark process. In the provided answer.tsv file, only the UniProt ID and description are included, but there is no information about the exact location of the motif (e.g., specific residue positions).

Given this, how can we determine whether the algorithm correctly matches the motif at the right positions? Is the assumption here that every match is considered a good match, and therefore, we only need to focus on whether a match occurs or not?

Any clarification or additional insights would be greatly appreciated!

@khb7840 khb7840 added enhancement New feature or request question Further information is requested labels Jan 26, 2025
@khb7840
Copy link
Member

khb7840 commented Jan 26, 2025

Thank you for your feedback!
The benchmark module primarily assesses whether folddisco can retrieve proteins known to have specific motifs (like zinc fingers or serine peptidases) in their structures. While I’ve mostly done position-based checks using external tools, incorporating such checks into folddisco would be useful.

For the determination of correct matches in folddisco, currently, folddisco evaluates match quality through node_count (the number of nodes covered by matched pairwise features) and RMSD. Because partial motif matches are allowed in folddisco, higher node_count and lower RMSD generally indicate a better match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants