Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T001: Frozen bioactivity dataset is not frozen #134

Closed
dominiquesydow opened this issue Sep 2, 2021 · 1 comment · Fixed by #74
Closed

T001: Frozen bioactivity dataset is not frozen #134

dominiquesydow opened this issue Sep 2, 2021 · 1 comment · Fixed by #74
Assignees
Labels
bug Something isn't working

Comments

@dominiquesydow
Copy link
Collaborator

In T001, we thought we froze the bioactivity dataset (by checking for activity IDs in ChEMBL 27) but it seems not to work.

Version on master branch:
Number of bioactivities queried for EGFR in this notebook: 7178
Number of bioactivities after ChEMBL 27 intersection: 7178

Running this notebook today:
Number of bioactivities queried for EGFR in this notebook: 8817
Number of bioactivities after ChEMBL 27 intersection: 8031 (I would expect 7178)

@jaimergp, I am sorry to bother you with this.
Do you understand why our intersection with the chembl27_activities.npz.zip does not produce stable results?

If not: Since I do not have the time to debug this (and you probably neither), my suggestion is to remove the chembl27_activities.npz.zip freezing bit --- and instead freeze the final output dataset output_df from this notebook to ensure stable outputs in all downstream talktorials (T002-T007).

@dominiquesydow dominiquesydow self-assigned this Sep 2, 2021
@dominiquesydow
Copy link
Collaborator Author

The numbers we used to determine uniqueness may not be identifies; thus let's go with the freezing the final output dataset to this dataset:
https://github.com/volkamerlab/teachopencadd/blob/ea055ef7b14d57eda20482c42c7b39da0fd8368a/teachopencadd/talktorials/T001_query_chembl/data/EGFR_compounds.csv

  • Rename file from EGFR_compounds.csv to EGFR_compounds_ea055ef.csv (added commit hash).
  • Add step to notebook where we overwrite output_df with the content of that frozen dataset (add note for users to disable that cell if they want to use the latest dataset).

@dominiquesydow dominiquesydow linked a pull request Sep 8, 2021 that will close this issue
27 tasks
@dominiquesydow dominiquesydow added the bug Something isn't working label Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant