-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: AWS glue catalog support for iceberg_scan() #51
base: main
Are you sure you want to change the base?
feat: AWS glue catalog support for iceberg_scan() #51
Conversation
Many comparisons where performed using `yyjson_get_tag()` rather than `yyjson_get_type()`. The tag can have additional information set using bits beyond just the type, causing these type comparisons to fail and JSON failing to parse. fix: fix the extension to build with current duckdb main branch. Fix a few std::move() calls and a call to fs.OpenFile().
Hey @rustyconover! Thanks a lot for the PR's! To review this I will need to setup some aws glue table table myself to test it out, I will try to find some time tomorrow to do this. One small comment I do have already is that I'm not sure the json string is the neatest way of passing the configuration to the Iceberg scan function. Maybe we can instead just add all of them as named_parameters to the iceberg table function. I think many of these will be shared among |
Hi @samansmink, I'll look at changing to named parameters and post a revised PR. Rusty |
fix: add support for iceberg_metadata function. Change a lot of static functions around so that the configuration for the catalog information can be easily passed around.
Hi @samansmink, I've changed things around to use named parameters and added the support so that the Rusty |
You can now run queries that look like this: select * from iceberg_scan('users', catalog_type="glue", region="us-east-1", database_name="test_iceberg");
select * from iceberg_metadata('users', catalog_type="glue", region="us-east-1", database_name="test_iceberg"); |
@rustyconover - Thank you for this PR and #50. |
Hi @harel-e, Thank you for your kind words. Unfortunately I can't help you build the extension or package it as a Docker container. You might want to try asking on the DuckDB discord for help building DuckDB. I'm building it on Mac OS X. I had to make some changes to vcpkg to work around the fall out of the xz package unavailability with boost. Rusty |
vcpkg should be restored again from the xz debacle afaik! Check out https://github.com/duckdb/extension-template for some instructions on setting up vcpkg for extension builds. |
I tested this branch on AWS with several Iceberg tables. This query pattens works fine: Hoping to see it in the upcoming 0.10.3 Thank you @rustyconover for this wonderful addition. DuckDB is now one step closer to work seamlessly in AWS |
Sorry for the absence here, I've been really busy There are still some problems remaining with CI here on windows and linux amd64, those would need to be fixed for this to get merged before 0.10.3 |
I'll take a look at the linux build failures, but the windows ones I don't have access to that platform. |
@rustyconover : Does this support Nessie catalog for iceberg? |
Any chance to make this PR merged? |
@rustyconover are you still working on this? would it make sense for someone to pick this up? |
I'm not actively working on this PR, feel free to finish it up. |
We need catalogue support |
I think it would be better to add an That way you could just query and not worry about configuration it every time. Also, same secret can be used for other catalog types (e.g: REST). |
Add support for accessing tables stored at AWS Glue.
Example SQL call:
Added the framework for more additional external Iceberg catalog:
This JSON object should be of this format: