Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parquet test data #30

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

add parquet test data #30

wants to merge 1 commit into from

Conversation

yaqi-zhao
Copy link

No description provided.

@pitrou
Copy link
Member

pitrou commented Dec 13, 2022

Hi @yaqi-zhao ,

  1. Can you clarify the PR title and description to explain what this is about?
  2. Can you fill in information about the data files in https://github.com/apache/parquet-testing/blob/master/data/README.md?

@yaqi-zhao
Copy link
Author

Hi, @pitrou I submitted a PR to Apache/Arrow(apache/arrow#14585) and add a benchmark test which will use these files. The test intend to analyze the parquet reader performace with the different bit width packing.

@yaqi-zhao yaqi-zhao force-pushed the master branch 3 times, most recently from 5f4a154 to 8834fc5 Compare December 14, 2022 07:18
@pitrou
Copy link
Member

pitrou commented Dec 14, 2022

How long does it take to generate those files on the fly from the benchmarks?

In general parquet-testing is for interoperability testing between different Parquet implementations, not for benchmarking of individual implementations.

At worse we could use arrow-testing for that, but even then we should strive to make the files much smaller. We don't want to consume hundreds of MB just for a single set of benchmarks, IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants