-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: replace data/flights-3m.csv with data/flights-3m.parquet #628
Conversation
@dsmedia thanks for getting to this so quickly! One thing stood out here to me. The size for all I wouldn't have expected such a dramatic difference between the two. Do you get the same result using |
Actually, do you think we should try arrow instead of parquet? It doesn't support compression in JavaScript yet but can be read with common libraries in the browser. Parquet is good as a storage format but if we want to expose the flights file, we might want to use a format that can be easily read in browsers. Thoughts? |
Related vega/vega#3961 From a IIRC there is one dataset already that is exported as |
How large would it be? Is it like avro in #627 (comment)? |
@domoritz I've updated the spec with a larger font size. Open the (updated) Chart in the Vega Editor It already included 3 |
Thanks. Now I think let's use parquet to have another file format people can use to demo loaders. |
Great catch, @dangotbanned. The generation script was including an (unneeded) index in the parquet file by default. Removing the index reduced the parquet file size to 12mb, in line with your expectations. The index is now excluded by default in
|
Hehe, as I predicted #627 (comment). Thank you for the pull request. |
https://github.com/vega/vega-datasets/releases/tag/v2.11.0 Includes support for `.parquet` following: - vega/vega-datasets#628 - vega/vega-datasets#627
Changes
scripts/flights.py
to handle parquet output with customizable compressiondata/flights-3m
dataset using:Note: Replace /path/to/DOT/zip/files with the local directory containing the Bureau of Transportation Statistics (BTS) monthly ZIP files from their website. Download prezipped files, one per month.