-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Bucket and Truncate transforms on write #1345
base: main
Are you sure you want to change the base?
Conversation
560ba20
to
bd80f39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great to have writes for all the different transformations!
@pytest.mark.parametrize( | ||
"spec, expected_rows", | ||
[ | ||
# none of non-identity is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# none of non-identity is supported |
source_type: PrimitiveType, | ||
input_arr: Union[pa.Array, pa.ChunkedArray], | ||
expected: Union[pa.Array, pa.ChunkedArray], | ||
num_buckets: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wydt of reordering these for readability? num_buckets
, source_type
and input_arr
are configs of the BucketTransform; expected is the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think I feel indifferent here - there’s something nice about having the input and expected arrays side by side
Getting the PR ready for when
pyiceberg_core
is released fromiceberg-rust
PR to introduce python binding release: apache/iceberg-rust#705
Fixes: #1074
Consideration: we could replace the existing
pyarrow
dependency onorder_preserving
transforms (Month,Year,Date) withpyiceberg_core
for consistency