Skip to content

DeepCOP - Deep Learning-Based Approach to Predict Gene Regulating Effects of Small Molecules

License

Notifications You must be signed in to change notification settings

benanbardak/DeepCOP-1

 
 

Repository files navigation

DeepCOP - Deep gene COmpound Profiler

This is the codebase that was used to obtain the results in the corresponding paper: https://www.ncbi.nlm.nih.gov/pubmed/31504186.

Use steps 1-4 to train and validate an MLP model to predict the gene expression of a gene given a molecule on a particular cell line of the LINCS L1000 dataset. Step 5 is used to evaluate the trained models against actual RNA-Seq values.

Data Preparation

  1. Download and uncompress the level 5 gctx data files and experiment metadata from GEO

  2. The rar files in the Data folder are large so they had to be compressed to github. Uncompress these files in the same folder.

  3. Use get_xy.py or get_xy_phase2.py to collect the training data and labels.

    • You will need to set the LINCS_data_path to the folder where you uncompressed the gctx data files.

Train and Validate MLP Models

  1. Use internal_validation.py to do 10 fold cross validation on the training data.
    • This step will save the trained models to SavedModels folder as well as the cutoff values. The cutoff values are saved when all 10 folds are evaluated. The cutoffs are used in step 5.

Evaluate Trained Model Predictions against Actual RNA-Seq Values

  1. Use external_validation.py to evaluate predictions from step 3 trained models on external RNA-Seq values.
    • When evaluating the predictions, cutoff values from step 4 will be used.

Extra

About

DeepCOP - Deep Learning-Based Approach to Predict Gene Regulating Effects of Small Molecules

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.6%
  • R 3.4%