Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-2.1: [fix](parquet) impl has_dict_page to replace old logic and fix write empty parquet row group bug #45740 #45954

Merged
merged 1 commit into from
Dec 26, 2024

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #45740

…empty parquet row group bug (#45740)

### What problem does this PR solve?
Problem Summary:

Checks if the given column has a dictionary page.
 
This function determines the presence of a dictionary page by checking
the `dictionary_page_offset` field in the column metadata. The
`dictionary_page_offset` must be set and greater than 0, and it must be
less than the `data_page_offset`.
 
The reason for these checks is based on the implementation in the Java
version of ORC, where `dictionary_page_offset` is used to indicate the
absence of a dictionary. Additionally, Parquet may write an empty row
group, in which case the dictionary page content would be empty, and
thus the dictionary page should not be read.
 
See https://github.com/apache/arrow/pull/2667/files
@Thearas
Copy link
Contributor

Thearas commented Dec 25, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Dec 25, 2024
@Thearas
Copy link
Contributor

Thearas commented Dec 25, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.38% (9538/26217)
Line Coverage: 27.90% (78578/281687)
Region Coverage: 26.57% (40332/151813)
Branch Coverage: 23.33% (20431/87572)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b353e5b3ef5dda8c61ee4fc37b8d805ae91df22a_b353e5b3ef5dda8c61ee4fc37b8d805ae91df22a/report/index.html

@morningman morningman merged commit df8bc8f into branch-2.1 Dec 26, 2024
17 of 19 checks passed
@github-actions github-actions bot deleted the auto-pick-45740-branch-2.1 branch December 26, 2024 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants