We implemented 3 GCN model for efficacy prediction:
- numpy
- pandas
- python >= 3.7
- Pytorch >= 1.5
- Pytorch Geometric >= 1.7
- RDkit
- optuna: (optional) hyperparameter search
Need at least one file
- a csv file with
- first column: SIMILES
- second to last column: float (regression), int (classification)
- (optional) a pickle file with a tuple (train, test, val) of row indices (splits).
python drug_gnn/train.py --data_path ${data} \
--task ${regression} \
--gnn_type dmpnn --log_dir checkpoints/dmpnn
python drug_gnn/predict.py --data_path ${data} \
--task ${regression} \
--gnn_type dmpnn --log_dir checkpoints/dmpnn
python drug_gnn/hyperopt.py --data_path ${data} --task ${regression} \
--gnn_type dmpnn \
--hyperopt_dir hyper_dmpnn
-
Train your model using LINCS 2020 Data
- input data format:
- shape: (num_smiles, num_landmark_genes)
- first column are SMILE strings
- the rest columns are expression values
- columns names should be Entrez IDs
- save
best_model
- input data format:
-
Prediction step will generate two output file
- Embeddings for each molecule:
xxx.embeddings.npy
- Predicted Landmark genes expression:
xxx.pred.exprs.csv
- Embeddings for each molecule:
-
Efficacy Score:
-
- Prepare a up- or down-regulated gene signatures (Entrez ID only): up.txt, down.txt
-
- Get transform matrix:
GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx
# convert to pandas DataFrame from cmapPy.pandasGEXpress.parse import parse weight = parse('GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx').data_df weight.to_csv("GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv")
- Get transform matrix:
-
- Predicted output from step 2
-
- Run:
python efficacy.py --weights GSE92743_Broad_OLS_WEIGHTS_n979x11350.csv \ --predicts xxx.pred.exprs.csv \ --up up.txt \ --down down.txt \ --output efficacy.csv
-
-
Average pearson's correlation (AUC-like plot) shows GNN works pretty good for predicting transcriptional profiles
-
TSNE plot of drug's embeddings
- Pearson's coefficiency distribution
Zhuoqing Fang: [email protected]
This project is based on chemprop, and chiral_gnn