Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More advanced techniques to read parquet files efficiently #589

Open
jiacai2050 opened this issue Jan 28, 2023 · 0 comments
Open

More advanced techniques to read parquet files efficiently #589

jiacai2050 opened this issue Jan 28, 2023 · 0 comments
Assignees
Labels
feature New feature or request

Comments

@jiacai2050
Copy link
Contributor

jiacai2050 commented Jan 28, 2023

Describe This Problem

Usually the IO part of a query is the most time consuming, so reducing time spent on this would improve query latency quietly a lot.

In current implementation, we have already applies some tricks to optimize this, to name a few:

  1. concurrent reads even for one file
  2. min/max prune
  3. custom bloom filter prune

There is an awesome blog written by @tustvold and @alamb introducing some more advanced techniques to further improve read speed, which is definitely a must-read for developer in Arrow ecosystem.

Proposal

Explore ideas introduced in Querying Parquet with Millisecond Latency, Some notable ideas are:

  • Page prune
  • Late materialization
  • Decode optimization, especially dictionary encoding

Additional Context

No response

@jiacai2050 jiacai2050 added the feature New feature or request label Jan 28, 2023
@jiacai2050 jiacai2050 self-assigned this May 18, 2023
jiacai2050 added a commit that referenced this issue Jun 5, 2023
## Rationale
Part of #589

## Detailed Changes
- Introduce `PagePruningPredicate` when build `ParquetRecordBatchStream`

## Test Plan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant