Explore our research paper: LEP-AD on BioRxiv
Welcome to our GitHub repository dedicated to our paper titled "LEP-AD: Language Embedding of Proteins and Attention to Drugs Predicts Drug Target Interactions." This research work delves into the repurposing of ESM Pretrained Models for Drug-Target Interaction (DTI) and was presented at the Machine Learning for Drug Discovery workshop (MLDD) during ICLR'23.
- Setup ESM-2 Repository
- Environment Setup
- Protein Representation with ESM
- LEP-AD for Drug-Target Interaction
- Automated Setup Script
Begin by cloning the ESM-2 repository:
After cloning, navigate to the esm
directory. Here, you'll need to create a directory for data storage:
mkdir data
Next, download the required datasets from the provided link and ensure they are stored in the data
directory you just created:
Download Data
For optimal performance, it's recommended to utilize CUDA 11.4. To set up the ESM environment, execute the following commands:
conda env create -f environment.yml
conda activate esm2
To derive protein representations from ESM, utilize the provided notebook. This will help in extracting unique proteins and making inferences using the ESM-2 model:
Execute the data_protein_esm.ipynb
notebook to generate protein representations from ESM-2.
With the protein representations from ESM in place, you're set to use LEP-AD for Drug-Target Interaction. To ensure there's no interference with the previous environment, we'll establish a new one:
conda env create -f environment_LEP_AD.yml
conda activate LEP-AD
To reproduce the results for each dataset, run the LEP-AD.ipynb
notebook. Alternatively, the following command line can be executed:
chmod +x setup_and_run.sh
./setup_and_run.sh