Signature Informed Sampling for Transcriptomic Data

Transcriptomic data are challenging to work with in deep learning applications due their high dimensionality and low patient numbers. Deep learning models tend to overfit this data, and do not generalize well on out-of-distribution samples and new cohorts. Data augmentation strategies help alleviate this problem by introducing synthetic data points and acting as regularisers. However, the existing approaches are either computationally intensive or require parametric estimates. We introduce a new solution to an old problem - a simple, non-parametric, and novel data augmentation approach where gene signatures are crossed over between patients to generate new samples. As a case study, we apply our method to transcriptomic data of colorectal cancer. Through experiments on two different datasets, we show that our method improves patient stratification by generating samples that mirror biological variability and generalise to out-of-distribution data. Our approach requires little to no computation, and achieves performance on par with, if not better than, the existing augmentation methods.

Data Availability

For reproducibility purposes, we provide the standardised augmented datasets and corresponding standardised test datasets here.

Installation

Create a conda environment:

conda env create -f conda.yml

Activate the environment:

conda activate sigsample

Install:

pip install .

development

Install in editable mode for development:

pip install --user -e .

Examples

For some examples on how to use signature_sampling see here. For experiments on MLP and VAE, see here

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
cli		cli
scripts		scripts
signature_sampling		signature_sampling
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda.yml		conda.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Signature Informed Sampling for Transcriptomic Data

Data Availability

Installation

development

Examples

About

Releases

Packages

Languages

License

PaccMann/transcriptomic_signature_sampling

Folders and files

Latest commit

History

Repository files navigation

Signature Informed Sampling for Transcriptomic Data

Data Availability

Installation

development

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages