Reference: Application of transfer learning for cancer drug sensitivity prediction.
Predictive modeling of drug sensitivity is an important step in precision therapy design and often there is a shortage of suitable data for modeling. Hence we attempt to use data from multiple sources for modeling purposes and the recent advent of large-scale pharmacogenomic studies offers a convenient getaway from this conundrum.
We use in vitro transcriptomic and drug sensitivity information from the administration of various anticancer drugs on multiple Cancer cell lines of different subtypes. This data were obtained from two renowned large-scale pharmacogenomic studies -
- Cancer Cell Line Encyclopedia (CCLE). A collaboration between Broad Institute and Novartis Institutes for Biomedical Research.
- Genomics of Drug Sensitivity in Cancer (GDSC). A collaboration between Wellcome Sanger Institute (UK) and Center for Molecular Therapeutics at Massachusetts General Hospital Cancer Center (USA).
There is a considerable overlap between the two studies and we have explored the existing inconsistencies between datasets in our previous work in Briefings in Bioinformatics: Evaluating the consistency of large-scale pharmacogenomic studies.
We have combined datasets from CCLE and GDSC through Transfer Learning since the samples from two different sources cannot be used together directly. To eliminate the distribution shift present in the two sets, we have implemented two different TL approaches -
We use the notion of a Latent variable space to model the underlying similarities between the genomic and sensitivity datasets and attempt to minimize the discepancies through cost optimization.
where zp and zs represents the primary (target) and secondary (source) sets, and w is the underlying latent variable.
We have implemented three different optimization based approaches -
- Latent regression prediction
- Latent-latent prediction
- Combined latent prediction
We implement a one-to-one feature mapping between the samples in the primary and secondary datasets using the Polynomial regression mapping to transfer the primary data to the secondary space and perform prediction using the larger datasets available in the secondary space. Note that, this approach assumes the existence of a set of matched samples between the two sets.
where zp, k and zs, k represents the k-th feature (gene/drug) in the primary and secondary sets, and γ is the polynomial mapping coefficient.
The details of these approaches are described in our 2018 paper: Application of transfer learning for cancer drug sensitivity prediction. Below provides an overview of the TL scenarios involved in this implementation.
We also implemented a Dimensionality Reduction based Transfer Learning approach using the Principal Component Analysis. The details of this work can be found in this paper: Dimensionality Reduction based Transfer Learning applied to Pharmacogenomics Databases.
This repository contains the necessary code to reproduce the results described in the paper and the corresponding source for the data used in the simulation experiments.
- Data: Contains data for modeling. The processed data used in our experiments can be found here
MappingTransLearn.m
: Main function for Domain Transfer with Nonlinear Mapping approachLatentPredTransLearn.m
: Main function for Latent Variable based Cost Optimization approaches - includes all three approachesTransLearnConcise_v2.m
: Main experiment codeDimRedTransLearn.m
: Main file for Dimensionality Reduction based Transfer Learning
If you use either the Domain Transfer TL approach or Latent Variable Cost Optimization TL approach for your research/application, please cite the following paper -
Dhruba, S., Rahman, R. et al. Application of transfer learning for cancer drug sensitivity prediction. BMC Bioinformatics 19, 497 (2018). DOI: https://doi.org/10.1186/s12859-018-2465-y
If you use our work on the exploration of inconsistencies in the large pharmacogenomic studies, please cite the following paper -
Rahman, R., Dhruba, S. R. et al., Evaluating the consistency of large-scale pharmacogenomic studies, Briefings in Bioinformatics, 20 (5), 1734 – 1753 (2019). DOI: https://doi.org/10.1093/bib/bby046
If you use the Dimensionality Reduction based TL approach for your research/application, please cite the following paper -
Dhruba, S.R., Rahman, R. et al., Dimensionality reduction based transfer learning applied to pharmacogenomics databases, In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology society (EMBC), Honolulu, HI, 1246 - 1249 (2018).
DOI: https://doi.org/10.1109/EMBC.2018.8512457