You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Usually the IO part of a query is the most time consuming, so reducing time spent on this would improve query latency quietly a lot.
In current implementation, we have already applies some tricks to optimize this, to name a few:
concurrent reads even for one file
min/max prune
custom bloom filter prune
There is an awesome blog written by @tustvold and @alamb introducing some more advanced techniques to further improve read speed, which is definitely a must-read for developer in Arrow ecosystem.
Describe This Problem
Usually the IO part of a query is the most time consuming, so reducing time spent on this would improve query latency quietly a lot.
In current implementation, we have already applies some tricks to optimize this, to name a few:
There is an awesome blog written by @tustvold and @alamb introducing some more advanced techniques to further improve read speed, which is definitely a must-read for developer in Arrow ecosystem.
Proposal
Explore ideas introduced in Querying Parquet with Millisecond Latency, Some notable ideas are:
Additional Context
No response
The text was updated successfully, but these errors were encountered: