Skip to content

Latest commit

 

History

History
58 lines (39 loc) · 1.88 KB

README.md

File metadata and controls

58 lines (39 loc) · 1.88 KB

Pharmacokinetic_prediction

Prediction of intravenous pharmacokinetic parameters, including fu, MRT, t1/2, VD and CL, by training on 1352 compounds.

1.Paper and dataset

paper: http://dmd.aspetjournals.org/content/suppl/2018/08/16/dmd.118.082966.DC1

dataset: dataset.xlsx (download from supporting information)

2.Data flow

avatar

3.Description

dataset.xlsx

Column Description
SMILES smiles of the compounds
fu fraction of unbound drugs in plasma
MRT mean residence time of a drug in human body
t1/2 the half-life of a drug
VD volume of distribution
CL clearance

Training

1.Feature extraction

<function extract_features()>

Molecules are represented by morgan fingerprint(radius=2, length=2048) and 200 descriptors(generated by rdkit)

2.Splits of training and testing data

<function stratified_split()>

The whole data set are divided into training and testing data set with the proportion ~7:3 using stratified sampling strategy.

3.Modeling

<Class auto_gbdt()>

GBDT is used to fit the training data set. The parameters are optimized automatically by GridsearchCV. RMSD as a criteria to evaluate the model performance on the test set.

Prediction

1.SDF to DataFrame

<function smiles_from_lib()>

Convert the new data(SDF format) to DataFrame that containing SMILES, name, synonyms etc.

2.Feature extraction

<function extract_features()>

Almost the same as training process

3.Prediction

<function predict()>

Predict the y of new features.