Collection of datasets used in Vega and Vega-Lite examples. This data lives at https://github.com/vega/vega-datasets.
Common repository for example datasets used by Vega related projects. Keep changes to this repository minimal as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and for examples.
The list of sources is in sources.md.
To access the data in Observable, you can import vega-dataset
. Try our example notebook. To access these datasets from Python, you can use the Vega datasets python package. To access them from Julia, you can use the VegaDatasets.jl julia package.
The Vega datasets preview notebook offers a quick way to browse the content of the available datasets.
npm i vega-datasets
Now you have all the datasets in a folder in node_modules/vega-datasets/data/
.
npm i vega-datasets
Now you can import data = require('vega-datasets')
and access the URLs of any dataset with data[NAME].url
. data[NAME]()
returns a promise that resolves to the actual data fetched from the URL. We use d3-dsv to parse CSV files.
Here is a full example
import data from 'vega-datasets';
const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();
console.log(cars);
You can also get the data directly via HTTP served by GitHub like:
https://vega.github.io/vega-datasets/data/cars.json
You can use git subtree to add these datasets to a project. Add data git subtree add
like:
git subtree add --prefix path-to-data [email protected]:vega/vega-datasets.git gh-pages
Update to the latest version of vega-data with
git subtree pull --prefix path-to-data [email protected]:vega/vega-datasets.git gh-pages
- Update
weather.csv
andseattle-weather.csv
with better encoded weather condition, indicating more rain.
- Update
seattle-temps
with better sourced data. - Update
sf-temps
with better sourced data.
- Add
ohlc.json
. Thanks to @eitanlees!
- Add
annual-precip.json
. Thanks to @mattijn!
- Add
volcano.json
.
- Add
uniform-2d.json
.
- Add
windvectors.csv
. Thanks to @jwoLondon!
- Add
us-unemployment.csv
. Thanks to @palewire!
- Remove time in
weather.csv
.
- Fix typo in city name in
us-state-capitals.json
- Made data consistent with respect to origin by making them originated from a Unix platform.
- Add
co2-concentration.csv
.
- Add
earthquakes.json
.
- Add
graticule.json
, London borough boundaries, borough centroids and tube (metro) rail lines.
- Add
disasters.csv
with disaster type, year and deaths.
- Add 0 padding in zipcode dataset.
- Add U district cuisine data
- Add weather data for Seattle and New York.
- Add income, zipcodes, lookup data, and a dataset with three independent geo variables.
- Remove all tabs in
github.csv
to prevent incorrect field name parsing.
- Dates in
movies.json
are all recognized as date types by datalib. - Dates in
crimea.json
are now in ISO format (YYYY-MM-DD).
- Fix
cars.json
date format.
- Add Gapminder Health v.s. Income dataset.
- Add generated Github contributions data for punch card visualization.
- Add Anscombe's Quartet dataset.
- Change date format in weather data so that it can be parsed in all browsers. Apparently YYYY/MM/DD is fine. Can also omit hours now.
- Decode origins in cars dataset.
- Add Unemployment Across Industries in US.
- Fixed the date parsing on the CrossFilter datasets -- an older version of the data was copied over on initial import. A script is now available via
npm run flights N
to re-sampleN
records from the originalflights-3m.csv
dataset.
- Add
seattle-weather
dataset. Transformed with https://gist.github.com/domoritz/acb8c13d5dadeb19636c.
- Initial import from Vega and Vega-Lite.
- Change field names in
cars.json
to be more descriptive (hp
toHorsepower
).