-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Version 3.4.5 See merge request cdd/DrugEx!114
- Loading branch information
Showing
60 changed files
with
35,583 additions
and
21,555 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,30 @@ | ||
# Change Log | ||
From v3.4.3 to v3.4.4 | ||
From v3.4.4 to v3.4.5 | ||
|
||
## Fixes | ||
|
||
- Fixed a bug that may have caused the standardizer to return molecules failing in standardization in their original form instead of removing them (14fd58dc758cb882c2a24e4a481a9064318927f1). | ||
- Fixed a bug in calculation of the Pareto fronts (fronts are now calculated for maximization of objectives instead of objective minimization). | ||
- Patch a bug that that caused a crash when an invalid smiles was encountered in the fragment generation step. This | ||
bug was introduced in v3.4.4, now invalid smiles are skipped and a warning is printed to the log. | ||
|
||
## Changes | ||
|
||
None. | ||
- Installation of pip package with pyproject.toml instead of setup.cfg. | ||
- Methods `cpu_non_dominated_sort` and `gpu_non_dominated_sort` have been replace by `get_Pareto_fronts`. | ||
- Improve calculation of crowding distance. | ||
- The rewards module is refactored and the `RankingStrategy` class was replace by `ParetoRankingScheme` class. | ||
- The final reward calcuation for `ParetoRankingScheme`-based methods is now directly the scaled rank of the molecules. | ||
- The `ParetoTanimotoDistance` now has a attribute `distance_metric` which can be "min", "mean" or "mutual" instead of attribute `ranking`. | ||
- DrugEx is now compatible with the latest version of qsprpred v2.0.1, previous versions of qsprpred are no longer supported. | ||
- `drugex.generate` CLI environment arguments are no longer overwritten by environment variables from generator. | ||
|
||
## Removed Features | ||
|
||
None. | ||
None. | ||
|
||
## New Features | ||
|
||
None. | ||
- When installing package with pip, the commit hash and date of the installation is saved into `qsprpred._version` | ||
- Added an automated Docker runner for tests that can run on GPUs. See [testing/runner/README.md](testing/runner/README.md) for more information. | ||
- When installing package with pip, the commit hash and date of the installation is saved into `drugex._version`. This information is also used as a basis of a new dynamic versioning scheme for the package. The version number is generated automatically upon installation of the package and saved to `drugex.__version__`. | ||
- QSPRPred is now available as an optional dependency that can be installed with DrugEx using the `[qsprpred]` option. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
recursive-include drugex * = test_files/*.* | ||
recursive-include drugex test_data/*.* | ||
recursive-include drugex test_data/A2AR_RandomForestClassifier/*.* | ||
recursive-include drugex *.pkl.gz |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,35 @@ | ||
DrugEx | ||
==================== | ||
# DrugEx | ||
|
||
<img src='figures/logo.png' width=20% align=right> | ||
<p align=left width=70%> | ||
DrugEx is open-source software library for <i>de novo</i> design of small molecules with deep learning generative models in a multi-objective reinforcement learning framework. This toolkit is a continuation of the original and incremental work of Liu et al.'s DrugEx [<a href="liu_drugex1">1</a>, <a href="liu_drugex2">2</a>, <a href="liu_drugex3">3</a>] and is currently developed by Gerard van Westen's Computational Drug Discovery group. | ||
DrugEx is an open-source software library for <i>de novo</i> design of small molecules with deep learning generative models in a multi-objective reinforcement learning framework. The package contains multiple generator architectures and a variety of scoring tools and multi-objective optimisation methods. It has a flexible application programming interface and can readily be used via the command line interface [<a href="sicho_drugex">4</a>] (see [Quick Start](#quick-start) to get to work right away). | ||
|
||
The package contains multiple generator architectures and a variety of scoring tools and multi-objective optimisation methods. It has a flexible application programming interface and can readily be used via the command line interface [<a href="sicho_drugex">4</a>]. | ||
## History | ||
|
||
Quick Start | ||
=========== | ||
This software is a continuation of the original and incremental work of Liu et al.'s DrugEx [<a href="liu_drugex1">1</a>, <a href="liu_drugex2">2</a>, <a href="liu_drugex3">3</a>] and is currently developed by [Gerard van Westen's Computational Drug Discovery](https://twitter.com/cddleiden) group in Leiden, Netherlands. The first version of DrugEx [<a href="liu_drugex1">1</a>] consisted of a recurrent neural network (RNN) single-task agent of gated recurrent units (GRU) which were updated to long short-term memory (LSTM) units in the second version [<a href="liu_drugex2">2</a>], also introducing MOO-based RL and an updated exploitation-exploration strategy. In its third version, [<a href="liu_drugex3">3</a>] generators based on a variant of the transformer and a novel graph-based encoding allowing for the sampling of molecules with specific substructures were introduced. This package builds on these works and provides a unified API with increased usability and flexibile enough for customization. However, new additional features are beeing added as well [<a href="sicho_drugex">4</a>]. Furthermore, the development and traning of QSAR models, used to score molecules during reinforcement learning has been moved to a separate [QSPRpred](https://github.com/CDDLeiden/QSPRPred)-package, which became a useful library in its own right. | ||
|
||
|
||
## Workflow | ||
|
||
The DrugEx package provides classes to standardize, clean and encode molecules for the various deep learning algorithms provided in the package as well as features to set up and monitor training and optimization. The resulting models can be used readily for generation of focused libraries and are easily transferable. | ||
|
||
![Fig1](figures/TOC_figure.png) | ||
|
||
<!-- Introduction | ||
============= | ||
Due to the large drug-like chemical space available to search for feasible drug-like molecules, rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified. With the rapid growth of the application of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work, we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives similar to other known methods and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. In this work, the Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules we proposed a novel positional encoding for each atom and bond based on an adjacency matrix to extend the architecture of the Transformer. Each molecule was generated by growing and connecting procedures for the fragments in the given scaffold that were unified into one model. Moreover, we trained this generator under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, our proposed method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results demonstrated the effectiveness of our method in that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds. --> | ||
<!-- <b>Keywords</b>: deep learning, reinforcement learning, policy gradient, drug design, Transformer, multi-objective optimization</p> --> | ||
|
||
<!-- Deep learning Archietectures | ||
==================== | ||
![Fig2](figures/fig_2.png) | ||
Examples | ||
========= | ||
![Fig3](figures/fig_3.png) --> | ||
|
||
# Quick Start | ||
|
||
> A small step for exploring the drug space in need, a giant leap for exploiting a healthy state indeed. | ||
|
@@ -22,9 +44,11 @@ pip install git+https://github.com/CDDLeiden/DrugEx.git@master | |
|
||
### Optional Dependencies | ||
|
||
**[QSPRPred](https://github.com/CDDLeiden/QSPRPred.git)** - Optional package to install if you want to use the command line interface of DrugEx, which requires the models to be serialized with this package. It is also used by some examples in the tutorial. | ||
<<<<<<< HEAD | ||
**[QSPRPred](https://github.com/CDDLeiden/QSPRPred.git)** - Optional package to install if you want to use the command line interface of DrugEx, which requires the models to be serialized with this package. It is also used by some examples in the tutorial. Install DrugEx with the following command if you want these features: | ||
|
||
```bash | ||
pip install git+https://github.com/CDDLeiden/QSPRPred.git@v1.3.1 | ||
pip install "drugex[qsprpred] @ git+https://github.com/CDDLeiden/DrugEx.git@master" | ||
``` | ||
|
||
**[RAscore](https://github.com/reymond-group/RAscore)** - If you want to use the Retrosynthesis Accessibility Score in the desirability function. | ||
|
@@ -95,50 +119,25 @@ The DrugEx toolkit offers a variety of models with varying complexities, each wi | |
|
||
It is noteworthy, however, that even on a suboptimal configuration, it should be possible to fine-tune and optimize the basic sequential RNN model using reinforcement learning techniques if a pretrained model is used. Regarding the two transformers, we recommend leveraging multiple GPUs to increase throughput via parallelization, automated by the DrugEx package. This technique divides the model's workload across multiple GPUs, enabling the system to handle more significant volumes of data at a faster rate than when using a single GPU. | ||
|
||
History | ||
======= | ||
|
||
The first version of DrugEx [<a href="liu_drugex1">1</a>] consisted of a recurrent neural network (RNN) single-task agent of gated recurrent units (GRU) which were updated to long short-term memory (LSTM) units in the second version [<a href="liu_drugex2">2</a>], also introducing MOO-based RL and an updated exploitation-exploration strategy. In its third version, [<a href="liu_drugex3">3</a>] generators based on a variant of the transformer and a novel graph-based encoding allowing for the sampling of molecules with specific substructures were introduced. This package builds on these works to have a user-friendly but also easily customisable toolkit for DNDD with a development of an API and a command line interface, and the addition of new features [<a href="sicho_drugex">4</a>]. Furthermore, the development and traning of QSAR models, used to score molecules during reinforcement learning has been moved to a separate [QSPRpred](https://github.com/CDDLeiden/QSPRPred)-package. | ||
|
||
<!-- Introduction | ||
============= | ||
Due to the large drug-like chemical space available to search for feasible drug-like molecules, rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified. With the rapid growth of the application of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work, we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives similar to other known methods and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. In this work, the Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules we proposed a novel positional encoding for each atom and bond based on an adjacency matrix to extend the architecture of the Transformer. Each molecule was generated by growing and connecting procedures for the fragments in the given scaffold that were unified into one model. Moreover, we trained this generator under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, our proposed method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results demonstrated the effectiveness of our method in that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds. --> | ||
<!-- <b>Keywords</b>: deep learning, reinforcement learning, policy gradient, drug design, Transformer, multi-objective optimization</p> --> | ||
|
||
Workflow | ||
======== | ||
![Fig1](figures/TOC_figure.png) | ||
|
||
<!-- Deep learning Archietectures | ||
==================== | ||
![Fig2](figures/fig_2.png) | ||
Examples | ||
========= | ||
![Fig3](figures/fig_3.png) --> | ||
# License | ||
|
||
License | ||
======= | ||
Please see the LICENSE file for the license terms for the software. Basically it's free to academic users. If you do wish to sell the software or use it in a commercial product, then please contact Gerard J.P. van Westen: | ||
The software is licensed under the standard MIT license, which means it is free to use also in commercial applications as long as the copyright terms of the license are preserved. You can view the [LICENSE](./LICENSE) file for the full terms. If you have questions about the license or the use of the software in your organization, please, contact Gerard J.P. van Westen: | ||
|
||
[Gerard J.P. van Westen](mailto:[email protected]): [email protected] | ||
|
||
Current Development Team | ||
======================== | ||
# Current Development Team | ||
|
||
- [M. Sicho](https://github.com/martin-sicho) | ||
- [S. Luukkonen](https://github.com/sohviluukkonen) | ||
- [H. van den Maagdenberg](https://github.com/HellevdM) | ||
- [L. Schoenmaker](https://github.com/LindeSchoenmaker) | ||
- [O. Béquignon](https://github.com/OlivierBeq) | ||
|
||
Contributions | ||
============= | ||
# Contributions | ||
|
||
If you find that there is something missing, have a question, or you just want to contribute a new model or feature, please, feel free to open an issue to initiate a discussion. We are more than happy to improve the package with your contributions, bug reports and ideas. After the feature is discussed in its designated issue, the best way to contribute is to fork the repository, make your changes and then create a pull request. We will then review your changes and merge them into the main repository. Alternatively, you can contact us directly via [email](mailto:[email protected]). | ||
|
||
Acknowledgements | ||
================ | ||
# Acknowledgements | ||
|
||
We would like to thank the following people for significant contributions: | ||
|
||
|
@@ -151,8 +150,7 @@ We also thank the following Git repositories that gave Xuhan a lot of inspiratio | |
2. [ORGAN](https://github.com/gablg1/ORGAN) | ||
3. [SeqGAN](https://github.com/LantaoYu/SeqGAN) | ||
|
||
References | ||
========== | ||
# References | ||
|
||
<a name="liu_drugex1"></a> [1] [Liu X., Ye K., van Vlijmen H.W.T, IJzerman A.P., van Westen G.J.P. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. Journal of cheminformatics. 2019;11(1):35.](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0355-6) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.