Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to read gzip compressed metadata files #42

Merged

Conversation

devendrasr
Copy link

@devendrasr devendrasr commented Mar 6, 2024

The current version of iceberg extension does not support reading iceberg tables that were written with gzip compressed metadata files. Adding one more flag in the commands to consider the metadata files as gz while loading metadata.

scan data:

SELECT * FROM iceberg_scan("s3://my-bucket/icebergwh/someschema/t01", metadata_compression_codec="gzip") limit 10;

scan metadata:

SELECT * FROM iceberg_metadata("s3://my-bucket/icebergwh/someschema/t01", metadata_compression_codec="gzip") limit 10;

scan snapshots:

SELECT * FROM iceberg_snapshots("s3://my-bucket/icebergwh/someschema/t01", metadata_compression_codec="gzip") limit 10;

As of now metadata_compression_codec can be none or gzip; default is none.

@szarnyasg szarnyasg requested a review from samansmink March 6, 2024 13:09
Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @devendrasr thanks a lot for the PR! In general it looks good.

I added one comment regarding the actual unzipping and I have a more general comment: I would like to see some tests here! Maybe you can add some small test data with a test that confirms this feature works?

src/common/utils.cpp Show resolved Hide resolved
Copy link
Collaborator

@samansmink samansmink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @devendrasr Thanks for the changes, looks good! I have one more comment then I think this is good to go

src/common/iceberg.cpp Outdated Show resolved Hide resolved
@devendrasr
Copy link
Author

@samansmink Please let me know if you are good to go. Once this is done, we can think about #43

@samansmink samansmink merged commit f7f1a35 into duckdb:main Mar 18, 2024
16 checks passed
@samansmink
Copy link
Collaborator

looks good now, thanks again @devendrasr!

mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024
…ed_metadata

Support to read gzip compressed metadata files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants