Skip to content
greglu edited this page Jun 22, 2011 · 25 revisions

Million Song Dataset [datasets/msd]

Special thanks to Echo Nest for converting the whole 200+ GB HDF5 format of the dataset to TSV for us

NASDAQ daily prices and dividends [datasets/nasdaq]

NYSE daily prices and dividends [datasets/nyse]

Wikipedia XML dump [datasets/wikipedia]

Google Ngram [datasets/ngrams]

Geonames [datasets/geonames]

Reddit voting data [datasets/reddit]

Bixi Montreal [datasets/bixi]

  • XML dump of all the bike station information queried every minute over a couple of months.
  • Provided by Fabrice

DNS dataset [datasets/dns]

  • Contains the root file with all the domain names and their associated nameservers for the "com" TLD.

LDEO Surface Ocean CO2 Climatology data [datasets/ldeo]

Twitter dataset [datasets/twitter]

Flight dataset [datasets/flights]

  • Limited set of flight data containing origin, destination, departure time, return time, price and date. Only has flights originated from SEA.
  • Provided by Hopper

Amazon dataset [datasets/amazon]