BIDpred

Installation (In Ubuntu 18.04, 20.04)

Generate the conda environment

conda env create --file env.yaml --name BID
conda activate BID

Install related packages

pip install wget
pip install biopython
pip install biotite
pip install fair-esm  
sudo apt-get install dssp  # for generating RSA using Biopython

Install models

# at first, it may take some time to download the esm-2, esm-if model  
import torch
_, _ = torch.hub.load("facebookresearch/esm:main", "esm2_t33_650M_UR50D") # load esm-2 model

import esm
_, _ = esm.pretrained.esm_if1_gvp4_t16_142M_UR50() # load esm-if model

BUG FIX from esm

ImportError: cannot import name 'esmfold_structure_module_only_8M' from 'esm.pretrained' (/home/{user}/anaconda3/envs/Bepitope/lib/python3.8/site-packages/esm/pretrained.py)

simply copy-paste the functions starting from esmfold_structure_module_only_8M into the pretrained.py (https://github.com/facebookresearch/esm/blob/2b369911bb5b4b0dda914521b9475cad1656b2ac/esm/pretrained.py#L274)

or

cp pretrained.py /home/{user}/anaconda3/envs/Bepitope/lib/python3.8/site-packages/esm/pretrained.py

Then everything will be ok.

Dataset

In cluster_msa_annotation, the dataset for train, test is available Each file name corresponds to (PDB_ID)(Antigen_chain)(Antibody_Hchain_Lchain).phy example. 1eo8_A_HL.phy, 3pnw_R_QP.phy

Data curation

Antigen sequences were clustered using mmseq2 by sequence identity 70%.
Within cluster (at least 4 elements), multiple sequence alignment (MSA) was generated using ClustalW
Epitopes were annotated in the MSA from antigen-antibody complex data (6 Angstrom)

*Representative sequence is on the first row
*Epitopes are annotated as capital letter while non-epitopes are not.

You can read the data using Biopython

from Bio import AlignIO
align = AlignIO.read(file_path, "phylip")

print(align[0].id)  # id of the first sequence in the alignment
print(align[0].seq)  # amino acid of the first sequence in the alignment

In Rep_Antigen_PDB, you can get the PDB file of Representative sequence in each MSA. Each file name corresponds to (PDB_ID)_(Antigen_chain).pdb

In csv, you can get the

filtered_dataset.csv is data collected from SAbDab with filtering cutoff
epitope_annotation.csv is annotated from Ab-Ag complex with 6 Angstrom distance
train_csv contains immunodominance annotations of the training set (92 sets)
test_csv contains immunodominance annotations of the test set (24 sets)

Prediction

# predict the epitopes from pdb (fetched)
python inference.py --pdb 1cfi

# for multiple inference... in bash
for pdb in pdb1 pdb2 pdb3 pdb4 pdb5 pdb6 ... pdb10
> do
> python inference.py --pdb $pdb
> done

# for inference using pdb file from local computer...
# the pdb file must be located in Custom_PDB(default) directory 
# or any directory you assign with --pdb_path
python inference_customPDB.py --pdb 6FNZ

Replication

# simply evaluate the trained model(in checkpoint directory) on the epitope3d test set (45 PDB)
python evaluate.py

# to train the models and save
# in model training, we recommend using GPU. CPU work is quite slow.
python train.py --model_save_dir models

# assign new directory with --model_checkpoint
# python evaluate.py simply evaluate the models saved in checkpoint
python evaluate.py --model_checkpoint models

Each python program contains more arguments. You can check with -h option

ex) python inference.py/inference_CustomPDB.py/train.py/evaluate.py -h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BIDpred

Installation (In Ubuntu 18.04, 20.04)

BUG FIX from esm

Dataset

Prediction

Replication

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Rep_Antigen_PDB		Rep_Antigen_PDB
cluster_msa_annotation		cluster_msa_annotation
csv		csv
README.md		README.md
env.yaml		env.yaml
esm_embedding.py		esm_embedding.py
evaluate.py		evaluate.py
inference.py		inference.py
inference_CustomPDB.py		inference_CustomPDB.py
model.py		model.py
pretrained.py		pretrained.py
train.py		train.py

sj584/BIDpred

Folders and files

Latest commit

History

Repository files navigation

BIDpred

Installation (In Ubuntu 18.04, 20.04)

BUG FIX from esm

Dataset

Prediction

Replication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages