-
Notifications
You must be signed in to change notification settings - Fork 42
Home
ramey edited this page Oct 10, 2011
·
18 revisions
This project is a collection of scripts to download, process, and load microarray data sets. Most of them are small sample, high-dimensional data sets (i.e. the small n, large p problem). Additionally, most of the data sets presented here have been widely studied in the genetics/microarray, bioinformatics, statistics, and computer science literatures.
For each data set, I have included a small set of scripts in the main project folder that automatically download, clean, and save the data set if necessary. The filename scheme of the scripts each begin with numbers that indicate the order in which the files should be sourced.
The data sets that I have collected are:
- Chiaretti et al. (2004) - Acute Lymphoblastic Leukemia (ALL)
- Golub et al. (1999) - Leukemia
- Khan et al. (2001) - SRBCT