The Novartis DSAI Challenge 2019 aimed at predicting the Probability of Success (PoS) of a clinical candidate for a certain indication based on the data collected on Phase II trials. This repository contains the solution submitted by team E2C, consisting of members from the Genomics Insitute of the Novartis Research Foundation.
The training data was based on two proprietary pharmaceutical pipeline databases provided by Informa© (Pharmaprojects and Trialtrove), and we therefore do not provide the raw or processed training and inference data in this repository.
As mentioned, the code here does not run as it is, as it depends on both training datasets as well as other settings provided by the competition model evaluation docker environment. Nevertheless, the syntax can help you understand the entry points in the main analysis code EDA_v4.py.
NLTK should be preinstalled.
export PYTHONPATH=./lib:$PYTHONPATH
export NLTK_DATA="./nltk_data"
usage: EDA_v4.py [-h] [-p] [-e] [-i] [-r] [-d] [-s] [-k]
Phase2-Approval Prediction
optional arguments:
-h, --help show this help message and exit
-p, --hyperparameter hyperparameter tuning
-e, --estimator estimator tuning
-r, --recreate recreate feature matrix
-s, --sort sort features
configjson.py
EDA_v4.py -r
This creates feature_matrix.pkl.gz
Define hyperparameter search grid in params/hyperparameter.range.json
EDA_v4.py -p
This creates params/hyperparameter.best.json and and params/hyperparameter.estimator.json
Edit params/hyperparameter.range.json and run
EDA_v4.py -p
to refine number of estimator counts and learning rate You may repeat this tuning process.
Final hyperparameters in params/hyperparameter.best.json
EDA_v4.py –e
This trains a model using all training data based on params/hyperparameter.best.json
The model file is model.pkl.gz
EDA_v4.py
You can find more about DSAI 2019 and another winning model by Team Insight-Out.
- Yang Zhong (yang.zhong at novartis dot com)
- Bin Zhou (bin.zhou at novartis dot com)
- Shifeng Pan (span at gnf dot org)
- Yingyao Zhou (yingyao.zhou at novartis dot com)