This package extends AiZynthFinder to seamlessly integrate various single-step retrosynthesis models, as illustrated in our papers Models Matter: The Impact of Single-Step Models on Synthesis Prediction and Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction
Models Matter uses the default AiZynthFinder implementation and introduces the following enhancements:
- Smiles-based Expansion Strategy: This feature allows the integration of any retrosynthesis model operating at smiles-level with AiZynthFinder, not depending on the underlying search algorithm.
- ModelZoo Integration: The package includes ModelZoo, a dedicated framework that defines the single-step retrosynthesis approach within the smiles-based expansion strategy. Currently supported implementations are Chemformer, MHNreact, and LocalRetro.
To install the package, follow the sequential steps below:
# Clone this repository and all submodules
git clone --recursive https://github.com/AlanHassen/modelsmatter
# Create and initialize a conda environment
conda env create -f environments/environment.yml -n ssbenchmark
conda activate ssbenchmark
# Transition to the SSBenchmark directory
cd external/modelsmatter_modelzoo/
# First the installation of the ModelZoo
poetry install
# Navigate back to the models matter directory
cd ../..
# Finalize the installation process
poetry install
# Subsequently, install the appropriate single-step models or the necessary libraries.
Adaptations to AiZynthFinder config are necessary to accommodate the different single-step models (examples provided in config/
). The configurable settings include:
-
gpu_mode: Enable GPU mode for accelerated inference.
-
module_path: The path of the single-step retrosynthesis model repository.
-
model_path: The location of the trained model.
-
Additional parameters can be set, such as specifying the vocabulary for Chemformer via
vocab_path
.
Utilize the extended functionalities of AiZynthFinder through Models Matter for a comprehensive synthesis prediction experience.
Datasets and models that are not proprietary are available in the Models Matter figshare repository.
This study was partially funded by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Innovative Training Network European Industrial Doctorate grant agreement No. 956832 “Advanced machine learning for Innovative Drug Discovery”