Skip to content
greglu edited this page Jun 22, 2011 · 25 revisions

Million Song Dataset

Special thanks to Echo Nest for converting the whole 200+ GB HDF5 format of the dataset to TSV for us

NASDAQ daily prices and dividends

NYSE daily prices and dividends

Wikipedia XML dump

Google Ngram

Geonames

Reddit voting data

Bixi Montreal

  • XML dump of all the bike station information queried every minute over a couple of months.
  • Provided by Fabrice

DNS dataset

  • Contains the root file with all the domain names and their associated nameservers for the "com" TLD.

LDEO Surface Ocean CO2 Climatology data

Twitter dataset

Flight dataset

  • Limited set of flight data containing origin, destination, departure time, return time, price and date. Only has flights originated from SEA.
  • Provided by Hopper