Prerequisit: You must have installed python 3.x. All scripts were tested on Ubuntu 18.04.3 LTS
-
Install DeepLinkPrdiction package (for use of DLP model and manipulation of interaction networks):
cd DeepLinkPrediction python setup.py install
-
Clone and Install OpenNE (For methods GraRep, DeepWalk, node2vec and LINE):
cd OpenNE/src python setup.py install
-
Install forked EvalNE repository (for baseline methods and benchmarking framework). Clone the repository, then:
cd EvalNE python setup.py install
As a demonstration of how the DLP model can be called outside of the EvalNE framework, you will find a script called
DLP_demo.py
inside the demo
folder. All necessary data is in this demo
folder and the script can be run in its
entirety in ~298 seconds on a server (hardware: 48-core CPU and 189GB RAM) and ~336 seconds locally (hardware: 4-core CPU
and 16GB RAM).
-
Construct heterogeneous network by integrating the LOF screening data of a certain cancer type (
--disease
) with a functional interaction scaffold (--ppi_scaffold
) usingscripts/construct_train_test_edges_separate_args.py
. -
Construct train/test/validation edges using
construct_train_test_edges_separately.py
. -
Run EvalNE API's
- Run
EvalNE_General_Performance/EvalNE_General_Performance_API.py
to calculate the general performance (on gene-gene and gene-cell line interactions) of every method. - Run
EvalNE_CellLine_specific_Performance/EvalNE_CellLine_specific_performance_API.py
to calculate the dependency specific (gene-cell line interactions only) performance of every method. - Run
EvalNE_CellLine_specific_total_predictions/EvalNE_CellLine_Specific_totalprediction_API.py
to predict a probability for each possible gene-cell line combination
- Run
-
Run DLP models with the pretrained embeddings of DeepWalk.
- Run
DLP_baseline_initializer_Performance/DLP_baseline_initializer_performance_API.py
twice to calculate the general and dependency specific performance, respectively. - Run
DLP_baseline_initializer_Target_prediction/DLP_baseline_initializer_targetprediction_API.py
to predict a probability for each possible gene-cell line combination
- Run
-
Run
scripts/construct_combined_prediction_df.py
to construct a combined pandas dataframe which averages the probabilities of each interaction over the three runs.
These 5 steps are automated in run_single_cancertype.sh
for a single cancer type. To iterate over several
cancer types use run_several_cancertypes.sh
- The parameter
k-step
of the method GraRep needs to be a multiple of the embedding dimension - The parameter
order
of the method AROPE needs to be equal to the dimension of theweights
parameter - EvalNA API scripts are put in separate directories for the following reason: The version of EvalNE used in this work does not allow for the evaluation of different test sets on the same trained model. Hence, if we want to predict on a separate test and and also on a separate prediction set (eg all possible combinations of genes - cell line), 2 separate EvalNE runs are required. Additionally, EvalNE constructs tmp files on each run making it impossible to run EvalNE on 2 separate test sets at the same time impossible, unless each API call to EvalNE is sitauted in its own separate directory.