From 922ffc4fe462d85a090c8acc603b0db345f8fd62 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 08:40:29 +0100 Subject: [PATCH 01/19] Create new branch From ff4138f819b0a0263721677f547dbd68621a6e8a Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 09:21:13 +0100 Subject: [PATCH 02/19] Fix Raschka reference --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 401d7279..bb366b18 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -48,7 +48,7 @@ The KLIFS database offers a REST API compliant with the OpenAPI specification [@ - A session is set up, which allows access to various KLIFS *data sources* by different *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently include kinases, structures and annotated conformations, modified residues, pockets, ligands, drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. For example, ``session.structures.by_kinase_name`` fetches information on all structures for a query kinase. - The same API is used for local and remote sessions. - The returned data follows the same schema regardless of the session type (local/remote); all results obtained with bravado are formatted as Pandas DataFrames with standardized column names, data types, and handling of missing data. -- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded via biopandas [Raschka:2017] or RDKit [@rdkit]. +- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded via biopandas [@Raschka:2017] or RDKit [@rdkit]. OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation" or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps and DataFrame merges in case multiple KLIFS datasets need to be combined. OpenCADD-KLIFS is currently used in several projects from the Volkamer Lab [@volkamerlab] including TeachOpenCADD [@teachopencadd], OpenCADD-pocket [@opencadd_pocket], KiSSim [@kissim], KinoML [@kinoml], and PLIPify [@plipify]. From be083eeb20b9bc41c2d43484408ef54d47b61566 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:15:44 +0100 Subject: [PATCH 03/19] Reworked "Statement of need", "State of the field", and "Key Features" --- paper/paper.bib | 31 +++++++++++++++++++++++++++++++ paper/paper.md | 41 ++++++++++++++++++++++++++++++++--------- 2 files changed, 63 insertions(+), 9 deletions(-) diff --git a/paper/paper.bib b/paper/paper.bib index d1ea7459..1104b8a4 100644 --- a/paper/paper.bib +++ b/paper/paper.bib @@ -62,6 +62,28 @@ @inproceedings{Kluyver:2016 url = {https://eprints.soton.ac.uk/403913/}, } +@article{Mendez:2018, + author = {Mendez, David and Gaulton, Anna and Bento, A Patrícia and Chambers, Jon and De Veij, Marleen and Félix, Eloy and Magariños, María Paula and Mosquera, Juan F and Mutowo, Prudence and Nowotka, Michał and Gordillo-Marañón, María and Hunter, Fiona and Junco, Laura and Mugumbate, Grace and Rodriguez-Lopez, Milagros and Atkinson, Francis and Bosc, Nicolas and Radoux, Chris J and Segura-Cabrera, Aldo and Hersey, Anne and Leach, Andrew R}, + title = "{ChEMBL: towards direct deposition of bioassay data}", + journal = {Nucleic Acids Research}, + volume = {47}, + number = {D1}, + pages = {D930-D940}, + year = {2018}, + doi = {10.1093/nar/gky1075}, +} + +@article{Carles:2018, + author = {Carles, Fabrice and Bourg, St{\'{e}}phane and Meyer, Christophe and Bonnet, Pascal}, + title = {{PKIDB: A Curated, Annotated and Updated Database of Protein Kinase Inhibitors in Clinical Trials}}, + journal = {Molecules}, + volume = {23}, + number = {4}, + pages = {908}, + year = {2018}, + doi = {10.3390/molecules23040908}, +} + @misc{klifsswagger, author = {KLIFS}, title = {{KLIFS OpenAPI}}, @@ -70,6 +92,15 @@ @misc{klifsswagger url = {https://dev.klifs.net/swagger_v2/}, } +@misc{requests, + author = {requests}, + title = {{requests}}, + year = 2021, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/psf/requests}, +} + @misc{bravado, author = {bravado}, title = {{bravado}}, diff --git a/paper/paper.md b/paper/paper.md index bb366b18..ff2c0682 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -23,34 +23,57 @@ bibliography: paper.bib # Summary -Protein kinases are involved in most aspects of cell life due to their role in signal transduction. Dysregulated kinases can cause severe diseases such as cancer, inflammatory and neurodegenerative diseases, which has made them a frequent target in drug discovery for the last decades [@Cohen:2021]. +Protein kinases are involved in most aspects of cell life due to their role in signal transduction. Dysregulated kinases can cause severe diseases such as cancer, inflammation, and neurodegeneration, which has made them a frequent target in drug discovery for the last decades [@Cohen:2021]. The immense research on kinases has led to an increasing amount of kinase resources [@Kooistra:2017]. Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting drugs and other small molecules [@Kanev:2021]. The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. +[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. + +![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) + # Statement of need +The KLIFS resource [@Kanev:2021] contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. +KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment (MSA) [@vanLinden:2014]. +Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based pipelines is currently not straight-forward, especially for users without a background in online queries. Effortless switching between data queries from a _local_ KLIFS download and the _remote_ KLIFS database is not possible. + OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to integrate kinase resources into Python-based research projects. -This module offers access to KLIFS data [@Kanev:2021] such as information about kinases, structures, ligands, -interaction fingerprints, and bioactivities. -KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment (MSA) [@vanLinden:2014]. With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely from the KLIFS webserver. -The presented module provides identical APIs for the remote and local queries for KLIFS data and streamlines all output into +The presented module provides identical APIs for the remote and local queries and streamlines all output into standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal to work with in Jupyter notebooks [@Kluyver:2016]. -[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. -![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) +# State of the field + +The KLIFS database is unique in the structure-based kinase field in terms of integrating and annotating different data resources in a kinase- and pocket-focused manner. Kinases, structures, and ligands have unique identifiers in KLIFS, which makes it possible to fetch and filter cross-referenced information for a query kinase, structure, or ligand. + +- Kinase structures are fetched from the PDB, split by chains and alternate models, annotated with the KLIFS pocket of 85 residues, and aligned across the fully structurally covered kinome. +- Kinase-ligand interactions seen in experimental structures are annotated for the 85 pocket residues in the form of the KLIFS interaction fingerprint (KLIFS IFP). +- Bioactivity data measured against kinases are fetched from ChEMBL [@Mendez:2018] and linked to kinases, structures, and ligands available in KLIFS. +- Kinase inhibitor metadata are fetched from the PKIDB [@Carles:2018] and linked to co-crystallized ligands available in KLIFS. + +The KLIFS data integrations and annotations can be accessed in different ways, which are all open-sourced: -The KLIFS database offers a REST API compliant with the OpenAPI specification [@klifsswagger]. Our module OpenCADD-KLIFS uses bravado [@bravado] to dynamically generate a Python client based on the OpenAPI definitions and adds wrappers to enable the following functionalities: +- Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. +- Automated via the [KLIFS KNIME nodes](https://github.com/3D-e-Chem/knime-klifs): This mode is extremely useful if the users' projects are embedded in KNIME; programming is not needed. +- Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects in detail and explain how OpenCADD-KLIFS improves the user experience. + +The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package `requests` [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package `bravado` [@bravado]. This client offers a Python API to send requests and receive responses. +This setup is already extremely useful, however, it has a few drawbacks: the setup is technical, the output is not easily readable for humans and not ready for immediate down-stream integrations — requiring similar but not identical reformatting functions for different query results —, and switching from remote requests to local KLIFS download queries is not possible. Facilitating and streamlining these tasks is the purpose of OpenCADD-KLIFS as discussed in more detail in the next section. + +# Key Features + +The KLIFS database offers a REST API compliant with the OpenAPI specification [@klifsswagger]. Our module OpenCADD-KLIFS uses bravado to dynamically generate a Python client based on the OpenAPI definitions and adds wrappers to enable the following functionalities: - A session is set up, which allows access to various KLIFS *data sources* by different *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently include kinases, structures and annotated conformations, modified residues, pockets, ligands, drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. For example, ``session.structures.by_kinase_name`` fetches information on all structures for a query kinase. -- The same API is used for local and remote sessions. +- The same API is used for local and remote sessions, i.e. interacting with data from a KLIFS download folder and from the KLIFS website, respectively. - The returned data follows the same schema regardless of the session type (local/remote); all results obtained with bravado are formatted as Pandas DataFrames with standardized column names, data types, and handling of missing data. - Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded via biopandas [@Raschka:2017] or RDKit [@rdkit]. OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation" or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps and DataFrame merges in case multiple KLIFS datasets need to be combined. + OpenCADD-KLIFS is currently used in several projects from the Volkamer Lab [@volkamerlab] including TeachOpenCADD [@teachopencadd], OpenCADD-pocket [@opencadd_pocket], KiSSim [@kissim], KinoML [@kinoml], and PLIPify [@plipify]. For example, OpenCADD-KLIFS is applied in a [TeachOpenCADD tutorial](https://projects.volkamerlab.org/teachopencadd/talktorials/T012_query_klifs.html) to demonstrate how to fetch all kinase-ligand interaction profiles for all available EGFR kinase structures to visualize the per-residue interaction types and frequencies with only a few lines of code. From 9585555ba154991c8ddbd6287f4939a9b3971b13 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:19:04 +0100 Subject: [PATCH 04/19] Move figure back to "Statement of need" --- paper/paper.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index ff2c0682..d32e1543 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -28,10 +28,6 @@ The immense research on kinases has led to an increasing amount of kinase resour Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting drugs and other small molecules [@Kanev:2021]. The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. -[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. - -![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) - # Statement of need The KLIFS resource [@Kanev:2021] contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. @@ -44,6 +40,9 @@ With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS downl The presented module provides identical APIs for the remote and local queries and streamlines all output into standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal to work with in Jupyter notebooks [@Kluyver:2016]. +[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. + +![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) # State of the field From e277faea2327d1ced9b195c2c89ff535dfdd99f0 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:20:41 +0100 Subject: [PATCH 05/19] Move OpenCADD comment to Summary --- paper/paper.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index d32e1543..7688793e 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -28,6 +28,8 @@ The immense research on kinases has led to an increasing amount of kinase resour Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting drugs and other small molecules [@Kanev:2021]. The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. +[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. + # Statement of need The KLIFS resource [@Kanev:2021] contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. @@ -40,8 +42,6 @@ With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS downl The presented module provides identical APIs for the remote and local queries and streamlines all output into standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal to work with in Jupyter notebooks [@Kluyver:2016]. -[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. - ![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) # State of the field From 72890227123404969f8dbae682aff9b52a1349d0 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:32:29 +0100 Subject: [PATCH 06/19] Add references --- paper/paper.bib | 22 ++++++++++++++++++++++ paper/paper.md | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/paper/paper.bib b/paper/paper.bib index 1104b8a4..4a95dea9 100644 --- a/paper/paper.bib +++ b/paper/paper.bib @@ -84,6 +84,28 @@ @article{Carles:2018 doi = {10.3390/molecules23040908}, } +@article{McGuire:2017, + author = {McGuire, Ross and Verhoeven, Stefan and Vass, Márton and Vriend, Gerrit and de Esch, Iwan J. P. and Lusher, Scott J. and Leurs, Rob and Ridder, Lars and Kooistra, Albert J. and Ritschel, Tina and de Graaf, Chris}, + title = {3D-e-Chem-VM: Structural Cheminformatics Research Infrastructure in a Freely Available Virtual Machine}, + journal = {Journal of Chemical Information and Modeling}, + volume = {57}, + number = {2}, + pages = {115-121}, + year = {2017}, + doi = {10.1021/acs.jcim.6b00686}, +} + +@article{Kooistra:2018, + author = {Kooistra, A. J. and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. }, + title = {{3{D}-e-{C}hem: {S}tructural {C}heminformatics {W}orkflows for {C}omputer-{A}ided {D}rug {D}iscovery}}, + journal = {ChemMedChem}, + volume = {13}, + number = {6}, + pages = {614--626}, + year = {2018}, + doi = {10.1002/cmdc.201700754}, +} + @misc{klifsswagger, author = {KLIFS}, title = {{KLIFS OpenAPI}}, diff --git a/paper/paper.md b/paper/paper.md index 7688793e..053200a5 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -56,7 +56,7 @@ The KLIFS database is unique in the structure-based kinase field in terms of int The KLIFS data integrations and annotations can be accessed in different ways, which are all open-sourced: - Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. -- Automated via the [KLIFS KNIME nodes](https://github.com/3D-e-Chem/knime-klifs): This mode is extremely useful if the users' projects are embedded in KNIME; programming is not needed. +- Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME; programming is not needed. - Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects in detail and explain how OpenCADD-KLIFS improves the user experience. The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package `requests` [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package `bravado` [@bravado]. This client offers a Python API to send requests and receive responses. From 6ca8b0822d063c7128ff0833eccee4e361784eac Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:45:55 +0100 Subject: [PATCH 07/19] No code formatting for packages --- paper/paper.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index 053200a5..abe759d2 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -57,9 +57,9 @@ The KLIFS data integrations and annotations can be accessed in different ways, w - Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. - Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME; programming is not needed. -- Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects in detail and explain how OpenCADD-KLIFS improves the user experience. +- Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects and explain how OpenCADD-KLIFS improves the user experience. -The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package `requests` [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package `bravado` [@bravado]. This client offers a Python API to send requests and receive responses. +The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package requests [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package bravado [@bravado]. This client offers a Python API to send requests and receive responses. This setup is already extremely useful, however, it has a few drawbacks: the setup is technical, the output is not easily readable for humans and not ready for immediate down-stream integrations — requiring similar but not identical reformatting functions for different query results —, and switching from remote requests to local KLIFS download queries is not possible. Facilitating and streamlining these tasks is the purpose of OpenCADD-KLIFS as discussed in more detail in the next section. # Key Features From 902bb69e3b9ad1662910a49ea6a69d01b0bd0bae Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 11:46:10 +0100 Subject: [PATCH 08/19] Clean up refs --- paper/paper.bib | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/paper/paper.bib b/paper/paper.bib index 4a95dea9..7ca5e85c 100644 --- a/paper/paper.bib +++ b/paper/paper.bib @@ -9,14 +9,28 @@ @article{Cohen:2021 doi={10.1038/s41573-021-00195-4}, } -@article{Kooistra:2017, - author = {Kooistra, Albert J. and Volkamer, Andrea}, - title = {{Kinase-Centric Computational Drug Development}}, - journal = {Annu. Rep. Med. Chem.}, +@incollection{Kooistra:2017, + booktitle = {Platform Technologies in Drug Discovery and Validation}, + series = {Annual Reports in Medicinal Chemistry}, + editor = {Robert A. Goodnow}, + title = {Chapter Six - Kinase-Centric Computational Drug Development}, + author = {Albert J. Kooistra and Andrea Volkamer}, + publisher = {Academic Press}, volume = {50}, - pages = {197--236}, + pages = {197-236}, year = {2017}, - doi = {10.1016/BS.ARMC.2017.08.001}, + doi = {10.1016/bs.armc.2017.08.001}, +} + +@inproceedings{Kluyver:2016, + booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas}, + editor = {Fernando Loizides and Birgit Scmidt}, + title = {Jupyter Notebooks - a publishing format for reproducible computational workflows}, + author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team}, + publisher = {IOS Press}, + year = {2016}, + pages = {87--90}, + url = {https://eprints.soton.ac.uk/403913/}, } @article{Kanev:2021, @@ -51,17 +65,6 @@ @article{Raschka:2017 doi = {10.21105/joss.00279}, } -@inproceedings{Kluyver:2016, - booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas}, - editor = {Fernando Loizides and Birgit Scmidt}, - title = {Jupyter Notebooks - a publishing format for reproducible computational workflows}, - author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team}, - publisher = {IOS Press}, - year = {2016}, - pages = {87--90}, - url = {https://eprints.soton.ac.uk/403913/}, -} - @article{Mendez:2018, author = {Mendez, David and Gaulton, Anna and Bento, A Patrícia and Chambers, Jon and De Veij, Marleen and Félix, Eloy and Magariños, María Paula and Mosquera, Juan F and Mutowo, Prudence and Nowotka, Michał and Gordillo-Marañón, María and Hunter, Fiona and Junco, Laura and Mugumbate, Grace and Rodriguez-Lopez, Milagros and Atkinson, Francis and Bosc, Nicolas and Radoux, Chris J and Segura-Cabrera, Aldo and Hersey, Anne and Leach, Andrew R}, title = "{ChEMBL: towards direct deposition of bioassay data}", @@ -96,7 +99,7 @@ @article{McGuire:2017 } @article{Kooistra:2018, - author = {Kooistra, A. J. and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. }, + author = {Kooistra, A. J. and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. }, title = {{3{D}-e-{C}hem: {S}tructural {C}heminformatics {W}orkflows for {C}omputer-{A}ided {D}rug {D}iscovery}}, journal = {ChemMedChem}, volume = {13}, From d34a1bd065a003abc07a7b9453a49ef976217c20 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:18:56 +0100 Subject: [PATCH 09/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index abe759d2..b4a73cec 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -25,7 +25,7 @@ bibliography: paper.bib Protein kinases are involved in most aspects of cell life due to their role in signal transduction. Dysregulated kinases can cause severe diseases such as cancer, inflammation, and neurodegeneration, which has made them a frequent target in drug discovery for the last decades [@Cohen:2021]. The immense research on kinases has led to an increasing amount of kinase resources [@Kooistra:2017]. -Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting drugs and other small molecules [@Kanev:2021]. +Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting ligands [@Kanev:2021]. The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. [OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. From adc42dba389495de50ebd95ffa26d4729e124f91 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:19:10 +0100 Subject: [PATCH 10/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index b4a73cec..3cfe98f9 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -28,7 +28,7 @@ The immense research on kinases has led to an increasing amount of kinase resour Among them is the KLIFS database, which focuses on storing and analyzing structural data on kinases and interacting ligands [@Kanev:2021]. The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS data into workflows to facilitate computational kinase research. -[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is a part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. +[OpenCADD-KLIFS](https://opencadd.readthedocs.io/en/latest/databases_klifs.html) (``opencadd.databases.klifs``) is part of the [OpenCADD](https://opencadd.readthedocs.io/) package, a collection of Python modules for structural cheminformatics. # Statement of need From 86db200e042dd8add25753a6f2bbbefe1003c2e9 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:19:30 +0100 Subject: [PATCH 11/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 3cfe98f9..375e4216 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -33,7 +33,7 @@ The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS da # Statement of need The KLIFS resource [@Kanev:2021] contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. -KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment (MSA) [@vanLinden:2014]. +KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment [@vanLinden:2014]. Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based pipelines is currently not straight-forward, especially for users without a background in online queries. Effortless switching between data queries from a _local_ KLIFS download and the _remote_ KLIFS database is not possible. OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to From 98b777354703bf2b86b355e02c51757795b55468 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:20:14 +0100 Subject: [PATCH 12/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 375e4216..23cea19d 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -40,7 +40,7 @@ OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who se integrate kinase resources into Python-based research projects. With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely from the KLIFS webserver. The presented module provides identical APIs for the remote and local queries and streamlines all output into -standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal to work with in Jupyter notebooks [@Kluyver:2016]. +standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal if you work with Jupyter notebooks [@Kluyver:2016]. ![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) From 4cfb671349d12d00ec8b2a355b7d41400300b46c Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:21:41 +0100 Subject: [PATCH 13/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 23cea19d..065db370 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -42,7 +42,7 @@ With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS downl The presented module provides identical APIs for the remote and local queries and streamlines all output into standardized Pandas DataFrames [@pandas] to allow for easy and quick downstream data analyses (\autoref{fig:opencadd_klifs_toc}). This Pandas-focused setup is ideal if you work with Jupyter notebooks [@Kluyver:2016]. -![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) +![OpenCADD-KLIFS fetches KLIFS data [@Kanev:2021] offline from a local KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames [@pandas].\label{fig:opencadd_klifs_toc}](opencadd_klifs_toc.png) # State of the field From c6659a162000294ffd6bc4d077cf2a48b655d2f9 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:21:54 +0100 Subject: [PATCH 14/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 065db370..4f446df8 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -56,7 +56,7 @@ The KLIFS database is unique in the structure-based kinase field in terms of int The KLIFS data integrations and annotations can be accessed in different ways, which are all open-sourced: - Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. -- Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME; programming is not needed. +- Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME workflows; programming is not needed. - Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects and explain how OpenCADD-KLIFS improves the user experience. The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package requests [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package bravado [@bravado]. This client offers a Python API to send requests and receive responses. From 196094d8c3a5a9f1ae951471f7ea83094e8b8d65 Mon Sep 17 00:00:00 2001 From: Dominique Sydow Date: Thu, 16 Dec 2021 21:22:13 +0100 Subject: [PATCH 15/19] Update paper/paper.md Co-authored-by: AndreaVolkamer --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 4f446df8..30a53ebf 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -57,7 +57,7 @@ The KLIFS data integrations and annotations can be accessed in different ways, w - Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. - Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME workflows; programming is not needed. -- Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects and explain how OpenCADD-KLIFS improves the user experience. +- Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or to integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects and explain how OpenCADD-KLIFS improves the user experience. The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package requests [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package bravado [@bravado]. This client offers a Python API to send requests and receive responses. This setup is already extremely useful, however, it has a few drawbacks: the setup is technical, the output is not easily readable for humans and not ready for immediate down-stream integrations — requiring similar but not identical reformatting functions for different query results —, and switching from remote requests to local KLIFS download queries is not possible. Facilitating and streamlining these tasks is the purpose of OpenCADD-KLIFS as discussed in more detail in the next section. From 7a092bbe66aa128f80a2274bf95dfe5af590fa57 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 21:31:04 +0100 Subject: [PATCH 16/19] Implement more of Andrea's text edits --- paper/paper.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index 30a53ebf..84cf9437 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -34,7 +34,7 @@ The OpenCADD-KLIFS Python module offers a convenient integration of the KLIFS da The KLIFS resource [@Kanev:2021] contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and aligned across all structures using a multiple sequence alignment [@vanLinden:2014]. -Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based pipelines is currently not straight-forward, especially for users without a background in online queries. Effortless switching between data queries from a _local_ KLIFS download and the _remote_ KLIFS database is not possible. +Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based pipelines is currently not straight-forward, especially for users without a background in online queries. Furthermore, switching between data queries from a _local_ KLIFS download and the _remote_ KLIFS database is not readily possible. OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to integrate kinase resources into Python-based research projects. @@ -53,20 +53,20 @@ The KLIFS database is unique in the structure-based kinase field in terms of int - Bioactivity data measured against kinases are fetched from ChEMBL [@Mendez:2018] and linked to kinases, structures, and ligands available in KLIFS. - Kinase inhibitor metadata are fetched from the PKIDB [@Carles:2018] and linked to co-crystallized ligands available in KLIFS. -The KLIFS data integrations and annotations can be accessed in different ways, which are all open-sourced: +The KLIFS data integrations and annotations can be accessed in different ways, which are all open source: - Manually via the [KLIFS website](https://klifs.net/) interface: This mode is preferable when searching for information on a specific structure or smaller set of structures. - Automated via the [KLIFS KNIME](https://github.com/3D-e-Chem/knime-klifs) nodes [@McGuire:2017; @Kooistra:2018]: This mode is extremely useful if the users' projects are embedded in KNIME workflows; programming is not needed. - Programmatically using the REST API and KLIFS OpenAPI specifications: This mode is needed for users who seek to perform larger scale queries or to integrate different queries into programmatic workflows. In the following, we will discuss this mode in context of Python-based projects and explain how OpenCADD-KLIFS improves the user experience. The KLIFS database offers standardized URL schemes (REST API), which allows users to query data by defined URLs, using e.g. the Python package requests [@requests]. Instead of writing customized scripts to generate such KLIFS URLs, the KLIFS OpenAPI specifications — a document that defines the KLIFS REST API scheme — can be used to generate a Python client, using e.g. the Python package bravado [@bravado]. This client offers a Python API to send requests and receive responses. -This setup is already extremely useful, however, it has a few drawbacks: the setup is technical, the output is not easily readable for humans and not ready for immediate down-stream integrations — requiring similar but not identical reformatting functions for different query results —, and switching from remote requests to local KLIFS download queries is not possible. Facilitating and streamlining these tasks is the purpose of OpenCADD-KLIFS as discussed in more detail in the next section. +This setup is already extremely useful, however, it has a few drawbacks: the setup is technical, the output is not easily readable for humans and not ready for immediate downstream integrations — requiring similar but not identical reformatting functions for different query results —, and switching from remote requests to local KLIFS download queries is not possible. Facilitating and streamlining these tasks is the purpose of OpenCADD-KLIFS as discussed in more detail in the next section. # Key Features The KLIFS database offers a REST API compliant with the OpenAPI specification [@klifsswagger]. Our module OpenCADD-KLIFS uses bravado to dynamically generate a Python client based on the OpenAPI definitions and adds wrappers to enable the following functionalities: -- A session is set up, which allows access to various KLIFS *data sources* by different *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently include kinases, structures and annotated conformations, modified residues, pockets, ligands, drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. For example, ``session.structures.by_kinase_name`` fetches information on all structures for a query kinase. +- A session is set up automatically, which allows access to various KLIFS *data sources* by different *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently include kinases, structures and annotated conformations, modified residues, pockets, ligands, drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. For example, ``session.structures.by_kinase_name`` fetches information on all structures for a query kinase. - The same API is used for local and remote sessions, i.e. interacting with data from a KLIFS download folder and from the KLIFS website, respectively. - The returned data follows the same schema regardless of the session type (local/remote); all results obtained with bravado are formatted as Pandas DataFrames with standardized column names, data types, and handling of missing data. - Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded via biopandas [@Raschka:2017] or RDKit [@rdkit]. From 38f704a3f44510fe63db3404077e56651b777b49 Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 21:32:17 +0100 Subject: [PATCH 17/19] Add Andrea as corresponding author --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 84cf9437..b48cdaaf 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -11,7 +11,7 @@ authors: - name: Jaime Rodríguez-Guerra orcid: 0000-0001-8974-1566 affiliation: 1 - - name: Andrea Volkamer + - name: Andrea Volkamer^[corresponding author] affiliation: 1 orcid: 0000-0002-3760-580X affiliations: From fb1158c2b19f275538e2c5f9e35f290b35b85c9c Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Thu, 16 Dec 2021 21:37:12 +0100 Subject: [PATCH 18/19] Update bibtex to fix ref rendering --- paper/paper.bib | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/paper/paper.bib b/paper/paper.bib index 7ca5e85c..7b03117e 100644 --- a/paper/paper.bib +++ b/paper/paper.bib @@ -14,7 +14,7 @@ @incollection{Kooistra:2017 series = {Annual Reports in Medicinal Chemistry}, editor = {Robert A. Goodnow}, title = {Chapter Six - Kinase-Centric Computational Drug Development}, - author = {Albert J. Kooistra and Andrea Volkamer}, + author = {Kooistra, {A. J.} and Volkamer, A.}, publisher = {Academic Press}, volume = {50}, pages = {197-236}, @@ -45,7 +45,7 @@ @article{Kanev:2021 } @article{vanLinden:2014, - author={van Linden, Oscar P. J. and Kooistra, Albert J. and Leurs, Rob and de Esch, Iwan J. P. and de Graaf, Chris}, + author={{van Linden}, Oscar P. J. and Kooistra, Albert J. and Leurs, Rob and de Esch, Iwan J. P. and de Graaf, Chris}, title={KLIFS: A Knowledge-Based Structural Database To Navigate Kinase--Ligand Interaction Space}, journal={Journal of Medicinal Chemistry}, volume={57}, @@ -99,7 +99,7 @@ @article{McGuire:2017 } @article{Kooistra:2018, - author = {Kooistra, A. J. and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. }, + author = {Kooistra, {A. J.} and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. }, title = {{3{D}-e-{C}hem: {S}tructural {C}heminformatics {W}orkflows for {C}omputer-{A}ided {D}rug {D}iscovery}}, journal = {ChemMedChem}, volume = {13}, From d5c41498025e8b51764e6210b05caf2d2c97e71a Mon Sep 17 00:00:00 2001 From: dominiquesydow Date: Tue, 21 Dec 2021 10:37:23 +0100 Subject: [PATCH 19/19] Sync statement of need with OpenCADD-KLIFS paper --- docs/databases_klifs_statement_of_need.rst | 75 +++++----------------- 1 file changed, 17 insertions(+), 58 deletions(-) diff --git a/docs/databases_klifs_statement_of_need.rst b/docs/databases_klifs_statement_of_need.rst index 95fb15a3..1ae26cd8 100644 --- a/docs/databases_klifs_statement_of_need.rst +++ b/docs/databases_klifs_statement_of_need.rst @@ -1,24 +1,25 @@ Statement of need ================= -OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to -integrate kinase resources into Python-based research projects. -This module offers access to KLIFS data [Kanev_2021]_ such as information about kinases, -structures, ligands, +The KLIFS resource [Kanev_2021]_ contains information about kinases, structures, ligands, interaction fingerprints, and bioactivities. KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and -aligned across all structures using a multiple sequence alignment (MSA) [vanLinden_2014]_. -With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely -from the KLIFS webserver. -The presented module provides identical APIs for the remote and local queries for KLIFS data and -streamlines all output into -standardized `Pandas `_ DataFrames to allow for easy and -quick downstream data analyses (Figure 1). This Pandas-focused setup is ideal to work with in -Jupyter notebooks [Kluyver_2016]_. +aligned across all structures using a multiple sequence alignment [vanLinden_2014]_. +Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based +pipelines is currently not straight-forward, especially for users without a background in +online queries. +Furthermore, switching between data queries from a *local* KLIFS download and +the *remote* KLIFS database is not readily possible. -`OpenCADD-KLIFS `_ -(``opencadd.databases.klifs``) is a part of the `OpenCADD `_ -package, a collection of Python modules for structural cheminformatics. +OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to +integrate kinase resources into Python-based research projects. +With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or +remotely from the KLIFS webserver. +The presented module provides identical APIs for the remote and local queries and +streamlines all output into standardized Pandas DataFrames +`Pandas `_ to allow for easy and quick +downstream data analyses (Figure 1). +This Pandas-focused setup is ideal if you work with Jupyter notebooks [Kluyver_2016]_. .. raw:: html @@ -29,45 +30,6 @@ package, a collection of Python modules for structural cheminformatics. *Figure 1*: OpenCADD-KLIFS fetches KLIFS data offline from a KLIFS download or online from the KLIFS database and formats the output as user-friendly Pandas DataFrames. -The KLIFS database offers a REST API compliant with the OpenAPI specification -(`KLIFS OpenAPI `_). -Our module OpenCADD-KLIFS uses `bravado `_ to dynamically -generate a Python client based on the OpenAPI definitions and adds wrappers to enable the -following functionalities: - -- A session is set up, which allows access to various KLIFS *data sources* by different - *identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently - include kinases, structures and annotated conformations, modified residues, pockets, ligands, - drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more. - For example, ``session.structures.by_kinase_name`` fetches information on all structures for a - query kinase. -- The same API is used for local and remote sessions. -- The returned data follows the same schema regardless of the session type (local/remote); all - results obtained with bravado are formatted as Pandas DataFrames with standardized column names, - data types, and handling of missing data. -- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections - such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded - via biopandas [Raschka_2017]_ or `RDKit `_. - -OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more -complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation" -or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in -complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps -and DataFrame merges in case multiple KLIFS datasets need to be combined. -OpenCADD-KLIFS is currently used in several projects -from the `Volkamer Lab `_ -including -`TeachOpenCADD `_, -`OpenCADD-pocket `_, -`KiSSim `_, -`KinoML `_, and -`PLIPify `_. -For example, OpenCADD-KLIFS is applied in a -`TeachOpenCADD tutorial `_ -to demonstrate how to fetch all kinase-ligand interaction profiles for all available EGFR kinase -structures to visualize the per-residue interaction types and frequencies with only a few -lines of code. - .. [Kanev_2021] Kanev et al., (2021), KLIFS: an overhaul after the first 5 years of supporting kinase research, Nucleic Acids Research, @@ -80,7 +42,4 @@ lines of code. .. [Kluyver_2016] Kluyver et al., (2016), Jupyter Notebooks – a publishing format for reproducible computational workflows, In Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87-90, - doi:10.3233/978-1-61499-649-1-87. -.. [Raschka_2017] Raschka, (2017), - BioPandas: Working with molecular structures in pandas DataFrames, Journal of Open Source Software, - 2(14), 279, doi:10.21105/joss.00279. \ No newline at end of file + doi:10.3233/978-1-61499-649-1-87. \ No newline at end of file