PepINVENT: Generative Peptide Design Beyond Natural Amino Acids

Overview

PepINVENT is a generative reinforcement learning (RL) framework designed for generative peptide design, including peptides in the peptidic space that go beyond the standard 20 natural amino acids. This project enables the discovery and optimization of novel peptides with desired properties through multi-parameter optimization (MPO), making it applicable to peptide-based drug design and development projects. PepINVENT proposes amino acids from the peptide landscape unlocked by atomic representation of the peptide molecules to replace the query positions of an input peptide.

Some features include:

Novel peptide generation by generating peptides with natural and non-natural amino acids.
User-defined objectives where the user can specify a set of objectives or integrate new objectives to steer the generative model to peptide ideas with desired properties.
Explorative sampling: used to leverage the generative models to explore diverse designs from the extended peptide space.
Integration with RDKit: allows common physicochemical features to be incorporated to the objectives of generative runs.

Getting Started

Prerequisites

Python: Version 3.8 or higher
Cuda-enabled GPU

Installation

Clone the repository:

git clone https://github.com/MolecularAI/PepINVENT.git
cd PepINVENT

Create the Conda environment:
```
 conda env create -f pepinvent_env.yml
```
Activate the environment:
```
conda activate pepinvent
```

Now you can use the tool!

Repository Structure

test_data/: Test datasets used in the sampling experiments of the manuscript. In case of supplying a new csv file to sampling, a Source_Mol column with the masked peptides is needed.
models/: Pre-trained generative models and the predictive model for macrocyclic peptide permeability.
pepinvent/: Core code for peptide generation and optimization.
reinvent_models/: reinvent_models codebase extracted from https://github.com/MolecularAI/transformer_rl/
experiment_configurations/: The configuration files to sample peptides from the pretrained generative model or run RL-based generative runs. The json files indicate the run setups for the experiments conducted in the maunuscript.

Usage

The only thing that needs modification for a standard run are the file and folder paths. The generative runs produce logs that can be monitored by tensorboard.

1. Sample Peptides from the generative model: Sample peptides with new amino acids in the query positions using the pre-trained generative model.

python input_to_sampling.py some_sampling_parameters.json

2. Generate Peptides with RL: Design peptides in the RL loop, optimized for user-defined criteria.

python input_to_reinforcement_learning.py some_generative_run_setup.json

Configuration

Model parameters and settings for sampling and RL-based generation are managed through JSON configuration files in the experiment_configurations/ directory. Each file contains adjustable options to tailor the workflow to different peptide design goals.

Citation

If you use PepINVENT in your research, please cite the preprint:

@misc{geylan2024pepinventgenerativepeptidedesign,
      title={PepINVENT: Generative peptide design beyond the natural amino acids}, 
      author={Gökçe Geylan and Jon Paul Janet and Alessandro Tibo and Jiazhen He and Atanas Patronov and Mikhail Kabeshov and Florian David and Werngard Czechtizky and Ola Engkvist and Leonardo De Maria},
      year={2024},
      eprint={2409.14040},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM},
      url={https://arxiv.org/abs/2409.14040}, 
}

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PepINVENT: Generative Peptide Design Beyond Natural Amino Acids

Overview

Getting Started

Prerequisites

Installation

Repository Structure

Usage

Configuration

Citation

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
experiment_configurations		experiment_configurations
models		models
pepinvent		pepinvent
reinvent_models		reinvent_models
tests		tests
LICENSE		LICENSE
README.md		README.md
input_to_reinforcement_learning.py		input_to_reinforcement_learning.py
input_to_sampling.py		input_to_sampling.py
manager.py		manager.py
pepinvent_env.yml		pepinvent_env.yml
setup.py		setup.py

License

MolecularAI/PepINVENT

Folders and files

Latest commit

History

Repository files navigation

PepINVENT: Generative Peptide Design Beyond Natural Amino Acids

Overview

Getting Started

Prerequisites

Installation

Repository Structure

Usage

Configuration

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages