Skip to content
ramey edited this page Oct 10, 2011 · 18 revisions

Collection of Microarray Data Sets

This project is a collection of scripts to download, process, and load microarray data sets. Most of them are small sample, high-dimensional data sets (i.e. the small n, large p problem). Additionally, most of the data sets presented here have been widely studied in the genetics/microarray, bioinformatics, statistics, and computer science literatures.

For each data set, I have included a small set of scripts in the main project folder that automatically download, clean, and save the data set if necessary. The filename scheme of the scripts each begin with numbers that indicate the order in which the files should be sourced.

The data sets that I have collected are:

Clone this wiki locally