Skip to content

Commit

Permalink
Merge pull request #135 from volkamerlab/joss-review-issues-133-134
Browse files Browse the repository at this point in the history
Revise OpenCADD-KLIFS manuscript (issues #133 and #134)
  • Loading branch information
dominiquesydow authored Dec 21, 2021
2 parents ced63ec + d5c4149 commit 37714e6
Show file tree
Hide file tree
Showing 3 changed files with 125 additions and 88 deletions.
75 changes: 17 additions & 58 deletions docs/databases_klifs_statement_of_need.rst
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
Statement of need
=================

OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to
integrate kinase resources into Python-based research projects.
This module offers access to KLIFS data [Kanev_2021]_ such as information about kinases,
structures, ligands,
The KLIFS resource [Kanev_2021]_ contains information about kinases, structures, ligands,
interaction fingerprints, and bioactivities.
KLIFS thereby focuses especially on the ATP binding site, defined as a set of 85 residues and
aligned across all structures using a multiple sequence alignment (MSA) [vanLinden_2014]_.
With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or remotely
from the KLIFS webserver.
The presented module provides identical APIs for the remote and local queries for KLIFS data and
streamlines all output into
standardized `Pandas <https://doi.org/10.5281/zenodo.5574486>`_ DataFrames to allow for easy and
quick downstream data analyses (Figure 1). This Pandas-focused setup is ideal to work with in
Jupyter notebooks [Kluyver_2016]_.
aligned across all structures using a multiple sequence alignment [vanLinden_2014]_.
Fetching, filtering, and integrating the KLIFS content on a larger scale into Python-based
pipelines is currently not straight-forward, especially for users without a background in
online queries.
Furthermore, switching between data queries from a *local* KLIFS download and
the *remote* KLIFS database is not readily possible.

`OpenCADD-KLIFS <https://opencadd.readthedocs.io/en/latest/databases_klifs.html>`_
(``opencadd.databases.klifs``) is a part of the `OpenCADD <https://opencadd.readthedocs.io/>`_
package, a collection of Python modules for structural cheminformatics.
OpenCADD-KLIFS is aimed at current and future users of the KLIFS database who seek to
integrate kinase resources into Python-based research projects.
With OpenCADD-KLIFS, KLIFS data can be queried either locally from a KLIFS download or
remotely from the KLIFS webserver.
The presented module provides identical APIs for the remote and local queries and
streamlines all output into standardized Pandas DataFrames
`Pandas <https://doi.org/10.5281/zenodo.5574486>`_ to allow for easy and quick
downstream data analyses (Figure 1).
This Pandas-focused setup is ideal if you work with Jupyter notebooks [Kluyver_2016]_.

.. raw:: html

Expand All @@ -29,45 +30,6 @@ package, a collection of Python modules for structural cheminformatics.
*Figure 1*: OpenCADD-KLIFS fetches KLIFS data offline from a KLIFS download or
online from the KLIFS database and formats the output as user-friendly Pandas DataFrames.

The KLIFS database offers a REST API compliant with the OpenAPI specification
(`KLIFS OpenAPI <https://dev.klifs.net/swagger_v2/>`_).
Our module OpenCADD-KLIFS uses `bravado <https://github.com/Yelp/bravado>`_ to dynamically
generate a Python client based on the OpenAPI definitions and adds wrappers to enable the
following functionalities:

- A session is set up, which allows access to various KLIFS *data sources* by different
*identifiers* with the API ``session.data_source.by_identifier``. *Data sources* currently
include kinases, structures and annotated conformations, modified residues, pockets, ligands,
drugs, and bioactivities; *identifiers* refer to kinase names, PDB IDs, KLIFS IDs, and more.
For example, ``session.structures.by_kinase_name`` fetches information on all structures for a
query kinase.
- The same API is used for local and remote sessions.
- The returned data follows the same schema regardless of the session type (local/remote); all
results obtained with bravado are formatted as Pandas DataFrames with standardized column names,
data types, and handling of missing data.
- Files with the structural 3D coordinates deposited on KLIFS include full complexes or selections
such as proteins, pockets, ligands, and more. These files can be downloaded to disc or loaded
via biopandas [Raschka_2017]_ or `RDKit <http://www.rdkit.org>`_.

OpenCADD-KLIFS is especially convenient whenever users are interested in multiple or more
complex queries such as "fetching all structures for the kinase EGFR in the DFG-in conformation"
or "fetching the measured bioactivity profiles for all ligands that are structurally resolved in
complex with EGFR". Formatting the output as DataFrames facilitates subsequent filtering steps
and DataFrame merges in case multiple KLIFS datasets need to be combined.
OpenCADD-KLIFS is currently used in several projects
from the `Volkamer Lab <https://volkamerlab.org/>`_
including
`TeachOpenCADD <https://github.com/volkamerlab/teachopencadd>`_,
`OpenCADD-pocket <https://github.com/volkamerlab/opencadd>`_,
`KiSSim <https://github.com/volkamerlab/kissim>`_,
`KinoML <https://github.com/openkinome/kinoml>`_, and
`PLIPify <https://github.com/volkamerlab/plipify>`_.
For example, OpenCADD-KLIFS is applied in a
`TeachOpenCADD tutorial <https://projects.volkamerlab.org/teachopencadd/talktorials/T012_query_klifs.html>`_
to demonstrate how to fetch all kinase-ligand interaction profiles for all available EGFR kinase
structures to visualize the per-residue interaction types and frequencies with only a few
lines of code.

.. [Kanev_2021] Kanev et al., (2021),
KLIFS: an overhaul after the first 5 years of supporting kinase research,
Nucleic Acids Research,
Expand All @@ -80,7 +42,4 @@ lines of code.
.. [Kluyver_2016] Kluyver et al., (2016),
Jupyter Notebooks – a publishing format for reproducible computational workflows,
In Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87-90,
doi:10.3233/978-1-61499-649-1-87.
.. [Raschka_2017] Raschka, (2017),
BioPandas: Working with molecular structures in pandas DataFrames, Journal of Open Source Software,
2(14), 279, doi:10.21105/joss.00279.
doi:10.3233/978-1-61499-649-1-87.
88 changes: 72 additions & 16 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,28 @@ @article{Cohen:2021
doi={10.1038/s41573-021-00195-4},
}

@article{Kooistra:2017,
author = {Kooistra, Albert J. and Volkamer, Andrea},
title = {{Kinase-Centric Computational Drug Development}},
journal = {Annu. Rep. Med. Chem.},
@incollection{Kooistra:2017,
booktitle = {Platform Technologies in Drug Discovery and Validation},
series = {Annual Reports in Medicinal Chemistry},
editor = {Robert A. Goodnow},
title = {Chapter Six - Kinase-Centric Computational Drug Development},
author = {Kooistra, {A. J.} and Volkamer, A.},
publisher = {Academic Press},
volume = {50},
pages = {197--236},
pages = {197-236},
year = {2017},
doi = {10.1016/BS.ARMC.2017.08.001},
doi = {10.1016/bs.armc.2017.08.001},
}

@inproceedings{Kluyver:2016,
booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas},
editor = {Fernando Loizides and Birgit Scmidt},
title = {Jupyter Notebooks - a publishing format for reproducible computational workflows},
author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team},
publisher = {IOS Press},
year = {2016},
pages = {87--90},
url = {https://eprints.soton.ac.uk/403913/},
}

@article{Kanev:2021,
Expand All @@ -31,7 +45,7 @@ @article{Kanev:2021
}

@article{vanLinden:2014,
author={van Linden, Oscar P. J. and Kooistra, Albert J. and Leurs, Rob and de Esch, Iwan J. P. and de Graaf, Chris},
author={{van Linden}, Oscar P. J. and Kooistra, Albert J. and Leurs, Rob and de Esch, Iwan J. P. and de Graaf, Chris},
title={KLIFS: A Knowledge-Based Structural Database To Navigate Kinase--Ligand Interaction Space},
journal={Journal of Medicinal Chemistry},
volume={57},
Expand All @@ -51,15 +65,48 @@ @article{Raschka:2017
doi = {10.21105/joss.00279},
}

@inproceedings{Kluyver:2016,
booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas},
editor = {Fernando Loizides and Birgit Scmidt},
title = {Jupyter Notebooks - a publishing format for reproducible computational workflows},
author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team},
publisher = {IOS Press},
year = {2016},
pages = {87--90},
url = {https://eprints.soton.ac.uk/403913/},
@article{Mendez:2018,
author = {Mendez, David and Gaulton, Anna and Bento, A Patrícia and Chambers, Jon and De Veij, Marleen and Félix, Eloy and Magariños, María Paula and Mosquera, Juan F and Mutowo, Prudence and Nowotka, Michał and Gordillo-Marañón, María and Hunter, Fiona and Junco, Laura and Mugumbate, Grace and Rodriguez-Lopez, Milagros and Atkinson, Francis and Bosc, Nicolas and Radoux, Chris J and Segura-Cabrera, Aldo and Hersey, Anne and Leach, Andrew R},
title = "{ChEMBL: towards direct deposition of bioassay data}",
journal = {Nucleic Acids Research},
volume = {47},
number = {D1},
pages = {D930-D940},
year = {2018},
doi = {10.1093/nar/gky1075},
}

@article{Carles:2018,
author = {Carles, Fabrice and Bourg, St{\'{e}}phane and Meyer, Christophe and Bonnet, Pascal},
title = {{PKIDB: A Curated, Annotated and Updated Database of Protein Kinase Inhibitors in Clinical Trials}},
journal = {Molecules},
volume = {23},
number = {4},
pages = {908},
year = {2018},
doi = {10.3390/molecules23040908},
}

@article{McGuire:2017,
author = {McGuire, Ross and Verhoeven, Stefan and Vass, Márton and Vriend, Gerrit and de Esch, Iwan J. P. and Lusher, Scott J. and Leurs, Rob and Ridder, Lars and Kooistra, Albert J. and Ritschel, Tina and de Graaf, Chris},
title = {3D-e-Chem-VM: Structural Cheminformatics Research Infrastructure in a Freely Available Virtual Machine},
journal = {Journal of Chemical Information and Modeling},
volume = {57},
number = {2},
pages = {115-121},
year = {2017},
doi = {10.1021/acs.jcim.6b00686},
}

@article{Kooistra:2018,
author = {Kooistra, {A. J.} and Vass, M. and McGuire, R. and Leurs, R. and de Esch, I. J. P. and Vriend, G. and Verhoeven, S. and de Graaf, C. },
title = {{3{D}-e-{C}hem: {S}tructural {C}heminformatics {W}orkflows for {C}omputer-{A}ided {D}rug {D}iscovery}},
journal = {ChemMedChem},
volume = {13},
number = {6},
pages = {614--626},
year = {2018},
doi = {10.1002/cmdc.201700754},
}

@misc{klifsswagger,
Expand All @@ -70,6 +117,15 @@ @misc{klifsswagger
url = {https://dev.klifs.net/swagger_v2/},
}

@misc{requests,
author = {requests},
title = {{requests}},
year = 2021,
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/psf/requests},
}

@misc{bravado,
author = {bravado},
title = {{bravado}},
Expand Down
Loading

0 comments on commit 37714e6

Please sign in to comment.