Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Bucket and Truncate transforms on write #1345

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sungwy
Copy link
Collaborator

@sungwy sungwy commented Nov 20, 2024

Getting the PR ready for when pyiceberg_core is released from iceberg-rust

PR to introduce python binding release: apache/iceberg-rust#705

Fixes: #1074

Consideration: we could replace the existing pyarrow dependency on order_preserving transforms (Month,Year,Date) with pyiceberg_core for consistency

@kevinjqliu kevinjqliu self-requested a review December 19, 2024 17:15
@sungwy sungwy marked this pull request as ready for review December 24, 2024 18:35
@sungwy sungwy changed the title Introduce bucket transform feat: Support bucket and Truncate transforms on write Dec 24, 2024
@sungwy sungwy changed the title feat: Support bucket and Truncate transforms on write feat: Support Bucket and Truncate transforms on write Dec 24, 2024
@sungwy sungwy requested a review from Fokko December 24, 2024 20:45
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great to have writes for all the different transformations!

@pytest.mark.parametrize(
"spec, expected_rows",
[
# none of non-identity is supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# none of non-identity is supported

Comment on lines +1580 to +1583
source_type: PrimitiveType,
input_arr: Union[pa.Array, pa.ChunkedArray],
expected: Union[pa.Array, pa.ChunkedArray],
num_buckets: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wydt of reordering these for readability? num_buckets, source_type and input_arr are configs of the BucketTransform; expected is the output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I feel indifferent here - there’s something nice about having the input and expected arrays side by side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writes to Bucket Partitioned Tables
2 participants