Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add parquet write page limit parameter #4632

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

Aiden-Dong
Copy link
Contributor

@Aiden-Dong Aiden-Dong commented Dec 4, 2024

Purpose

Linked issue: close #4586

Added the parquet.page.row.count.limit parameter for Parquet file writing. The writing of Parquet files is influenced by both parquet.page.size and parquet.page.row.count.limit. If only parquet.page.size is set, it may not have an effect or could lead to misalignment of pages, impacting performance.

新增parquet 文件写入 parquet.page.row.count.limit 参数传递,parquet 文件写入时是通过 parquet.page.sizeparquet.page.row.count.limit 共同影响的, 如果单一设置 parquet.page.size 可能没有产生作用,或者导致 page 不对齐影响性能

// org.apache.parquet.column.impl.ColumnWriteStoreBase
private void sizeCheck() {
    ...
    int pageRowCountLimit = props.getPageRowCountLimit();
    ...
    for (ColumnWriterBase writer : columns.values()) {
      long usedMem = writer.getCurrentPageBufferedSize();
      long rows = rowCount - writer.getRowsWrittenSoFar();
      long remainingMem = props.getPageSizeThreshold() - usedMem;
       if (remainingMem <= thresholdTolerance || rows >= pageRowCountLimit) {
        writer.writePage();
        remainingMem = props.getPageSizeThreshold();
      }
    }
    ... 
}

Tests

API and Format

Documentation

@JingsongLi
Copy link
Contributor

Hi @Aiden-Dong , please keep English version in your description.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 812ef05 into apache:master Dec 4, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants