TAGMol: Target-Aware Gradient-guided Molecule Generation

Environment Setup

The code has been tested in the following environment:

conda create -n tagmol python=3.8.17
conda activate tagmol
conda install pytorch=1.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg=2.2.0 -c pyg
conda install rdkit=2022.03.2 openbabel=3.1.1 tensorboard=2.13.0 pyyaml=6.0 easydict=1.9 python-lmdb=1.4.1 -c conda-forge

# For Vina Docking
pip install meeko==0.1.dev3 scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3

IMPORTANT NOTE: You might have to do the following to append the path of the root working directory.

export PYTHONPATH=".":$PYTHONPATH

Data and Checkpoints

The resources can be found here. The data are inside data directory, the backbone model is inside pretrained_models and the guide checkpoints are inside logs.

Training

Training Diffusion model from scratch

python scripts/train_diffusion.py configs/training.yml

Training Guide model from scratch

BA

python scripts/train_dock_guide.py configs/training_dock_guide.yml

QED

python scripts/train_dock_guide.py configs/training_dock_guide_qed.yml

SA

python scripts/train_dock_guide.py configs/training_dock_guide_sa.yml

NOTE: The outputs are saved in logs/ by default.

Sampling

Sampling for pockets in the testset

BackBone

python scripts/sample_diffusion.py configs/sampling.yml --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.

We have a bash file that can run the inference for the entire test set in a loop.

bash scripts/batch_sample_diffusion.sh configs/sampling.yml backbone

The output will be stored in experiments/backbone. The following variables: BATCH_SIZE, NODE_ALL, NODE_THIS and START_IDX, can be modified in the script file, if required.

BackBone + Gradient Guidance

python scripts/sample_multi_guided_diffusion.py [path-to-config.yml] --data_id {i} # Replace {i} with the index of the data. i should be between 0 and 99 for the testset.

To run inference on all 100 targets in the test set:

bash scripts/batch_sample_multi_guided_diffusion.sh [path-to-config.yml] [output-dir-name]

The outputs are stored in experiments_multi/[output-dir-name]when run using the bash file. The config files are available in configs/noise_guide_multi.

Single-objective guidance
- BA: sampling_guided_ba_1.yml
- QED: sampling_guided_qed_1.yml
- SA: sampling_guided_sa_1.yml
Dual-objective guidance
- QED + BA: sampling_guided_qed_0.5_ba_0.5.yml
- SA + BA: sampling_guided_sa_0.5_ba_0.5.yml
- QED + SA: sampling_guided_qed_0.5_sa_0.5.yml
Multi-objective guidance (our main model)
- QED + SA + BA: sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml

For example, to run the multi-objective setting (i.e., our model):

bash scripts/batch_sample_multi_guided_diffusion.sh configs/noise_guide_multi/sampling_guided_qed_0.33_sa_0.33_ba_0.34.yml qed_0.33_sa_0.33_ba_0.34

Evaluation

Evaluating Guide models

python scripts/eval_dock_guide.py --ckpt_path [path-to-checkpoint.pt]

Evaluation from sampling results

python scripts/evaluate_diffusion.py {OUTPUT_DIR} --docking_mode vina_score --protein_root data/test_set

The docking mode can be chosen from {qvina, vina_score, vina_dock, none}

NOTE: It will take some time to prepare pqdqt and pqr files when you run the evaluation code with vina_score/vina_dock docking mode for the first time.

Results

Methods	Vina Score (↓)		Vina Min (↓)		Vina Dock (↓)		High Affinity (↑)		QED (↑)		SA (↑)		Diversity (↑)		Hit Rate % (↑)
Methods	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Avg.	Med.	Hit Rate % (↑)
Reference	-6.36	-6.46	-6.71	-6.49	-7.45	-7.26	-	-	0.48	0.47	0.73	0.74	-	-	21
liGAN	-	-	-	-	-6.33	-6.20	21.1%	11.1%	0.39	0.39	0.59	0.57	0.66	0.67	13.2
AR	-5.75	-5.64	-6.18	-5.88	-6.75	-6.62	37.9%	31.0%	0.51	0.50	0.63	0.63	0.70	0.70	12.9
Pocket2Mol	-5.14	-4.70	-6.42	-5.82	-7.15	-6.79	48.4%	51.0%	0.56	0.57	0.74	0.75	0.69	0.71	24.3
TargetDiff	-5.47	-6.30	-6.64	-6.83	-7.80	-7.91	58.1%	59.1%	0.48	0.48	0.58	0.58	0.72	0.71	20.5
DecompDiff	-4.85	-6.03	-6.76	-7.09	-8.48	-8.50	64.8%	78.6%	0.44	0.41	0.59	0.59	0.63	0.62	24.9
TAGMol	-7.02	-7.77	-7.95	-8.07	-8.59	-8.69	69.8%	76.4%	0.55	0.56	0.56	0.56	0.69	0.70	27.7

Due to space constraints, we only share the eval_results folder generated from the evaluation script. It can be found in the same link as other resources, inside results directory.

Citation

@article{dorna2024tagmol,
  title={TAGMol: Target-Aware Gradient-guided Molecule Generation},
  author={Vineeth Dorna and D. Subhalingam and Keshav Kolluru and Shreshth Tuli and Mrityunjay Singh and Saurabh Singal and N. M. Anoop Krishnan and Sayan Ranu},
  journal={arXiv preprint arXiv:2406.01650},
  year={2024}
}

Acknowledgements

This codebase was build on top of TargetDiff

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
datasets		datasets
models		models
scripts		scripts
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TAGMol: Target-Aware Gradient-guided Molecule Generation

Environment Setup

Data and Checkpoints

Training

Training Diffusion model from scratch

Training Guide model from scratch

BA

QED

SA

Sampling

Sampling for pockets in the testset

BackBone

BackBone + Gradient Guidance

Evaluation

Evaluating Guide models

Evaluation from sampling results

Results

Citation

Acknowledgements

About

Languages

MoleculeAI/TAGMol

Folders and files

Latest commit

History

Repository files navigation

TAGMol: Target-Aware Gradient-guided Molecule Generation

Environment Setup

Data and Checkpoints

Training

Training Diffusion model from scratch

Training Guide model from scratch

BA

QED

SA

Sampling

Sampling for pockets in the testset

BackBone

BackBone + Gradient Guidance

Evaluation

Evaluating Guide models

Evaluation from sampling results

Results

Citation

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages