Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add aws glue as an iceberg connection type #16824

Merged
merged 11 commits into from
Nov 16, 2024

Conversation

Rowlandev
Copy link
Contributor

@Rowlandev Rowlandev commented Nov 12, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 12, 2024
@Rowlandev Rowlandev marked this pull request as ready for review November 12, 2024 22:28
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change to the proto requires a new version to be added in this following file. And corresponding tests should be added. Please refer to the following instruction and existing test case file such as https://github.com/datafuselabs/databend/blob/2b6431c3e39b25a41011172718ba471df7f77800/src/meta/proto-conv/tests/it/v110_database_meta_gc_in_progress.rs

https://github.com/datafuselabs/databend/blob/2b6431c3e39b25a41011172718ba471df7f77800/src/meta/proto-conv/src/util.rs#L142

(110, "2024-09-18: Add: database.proto: DatabaseMeta.gc\_in\_progress"),  
// Dear developer:  
//      If you're gonna add a new metadata version, you'll have to add a test for it.  
//      You could just copy an existing test file(e.g., \`../tests/it/v024\_table\_meta.rs\`)  
//      and replace two of the variable \`bytes\` and \`want\`.

@Xuanwo Could you please help review this new catalog option, about the usage and testing?

Reviewed 10 of 13 files at r1, all commit messages.
Reviewable status: 10 of 13 files reviewed, all discussions resolved

@BohuTANG BohuTANG requested a review from Xuanwo November 13, 2024 01:26
@BohuTANG
Copy link
Member

BohuTANG commented Nov 13, 2024

For the check CI failed, we should keep the Cargo.toml crate is in order, the fix is(taplo fmt -f):

diff --git a/Cargo.toml b/Cargo.toml
index 71b5ec05d2..e82e52ac7e 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -308,9 +308,9 @@ humantime = "2.1.0"
 hyper = "1"
 hyper-util = { version = "0.1.9", features = ["client", "client-legacy", "tokio", "service"] }
 iceberg = { version = "0.3.0", git = "https://github.com/Xuanwo/iceberg-rust/", rev = "fe5df3f" }
+iceberg-catalog-glue = { version = "0.3.0", git = "https://github.com/Xuanwo/iceberg-rust/", rev = "fe5df3f" }
 iceberg-catalog-hms = { version = "0.3.0", git = "https://github.com/Xuanwo/iceberg-rust/", rev = "fe5df3f" }
 iceberg-catalog-rest = { version = "0.3.0", git = "https://github.com/Xuanwo/iceberg-rust/", rev = "fe5df3f" }
-iceberg-catalog-glue = { version = "0.3.0", git = "https://github.com/Xuanwo/iceberg-rust/", rev = "fe5df3f" }
 indexmap = "2.0.0"
 indicatif = "0.17.5"
 itertools = "0.13.0"
diff --git a/src/query/storages/iceberg/Cargo.toml b/src/query/storages/iceberg/Cargo.toml
index fc336ee6a0..add1c755fa 100644
--- a/src/query/storages/iceberg/Cargo.toml
+++ b/src/query/storages/iceberg/Cargo.toml
@@ -25,9 +25,9 @@ databend-storages-common-table-meta = { workspace = true }
 fastrace = { workspace = true }
 futures = { workspace = true }
 iceberg = { workspace = true }
+iceberg-catalog-glue = { workspace = true }
 iceberg-catalog-hms = { workspace = true }
 iceberg-catalog-rest = { workspace = true }
-iceberg-catalog-glue = { workspace = true }
 match-template = { workspace = true }
 ordered-float = { workspace = true }
 serde = { workspace = true }

@Xuanwo
Copy link
Member

Xuanwo commented Nov 13, 2024

Hi, @Rowlandev, thank you so much for creating this—it's greatly appreciated! Most of the changes look good to me. The only missing part is integrating with our proto tests to ensure compatibility with our meta services.

  • Please add a new entry here to register a new proto version.

(110, "2024-09-18: Add: database.proto: DatabaseMeta.gc_in_progress"),
// Dear developer:
// If you're gonna add a new metadata version, you'll have to add a test for it.
// You could just copy an existing test file(e.g., `../tests/it/v024_table_meta.rs`)
// and replace two of the variable `bytes` and `want`.
];

  • Please add a new test case here to make sure our proto compatible

You can refer to

#[test]
fn test_decode_v98_catalog() -> anyhow::Result<()> {
let catalog_meta_v98 = vec![
18, 55, 26, 53, 18, 45, 10, 21, 104, 116, 116, 112, 58, 47, 47, 49, 50, 55, 46, 48, 46, 48,
46, 49, 58, 57, 57, 48, 48, 18, 14, 115, 51, 58, 47, 47, 109, 121, 95, 98, 117, 99, 107,
101, 116, 160, 6, 98, 168, 6, 24, 160, 6, 98, 168, 6, 24, 162, 1, 23, 50, 48, 49, 52, 45,
49, 49, 45, 50, 56, 32, 49, 50, 58, 48, 48, 58, 48, 57, 32, 85, 84, 67, 160, 6, 98, 168, 6,
24,
];
let want = || databend_common_meta_app::schema::CatalogMeta {
catalog_option: CatalogOption::Iceberg(IcebergCatalogOption::Rest(
IcebergRestCatalogOption {
uri: "http://127.0.0.1:9900".to_string(),
warehouse: "s3://my_bucket".to_string(),
props: Default::default(),
},
)),
created_on: Utc.with_ymd_and_hms(2014, 11, 28, 12, 0, 9).unwrap(),
};
common::test_pb_from_to(func_name!(), want())?;
common::test_load_old(func_name!(), catalog_meta_v98.as_slice(), 98, want())?;
Ok(())
}

But changed with glue catalog.

@Rowlandev
Copy link
Contributor Author

@Xuanwo, @BohuTANG & @drmingdrmer - thanks for the help! Let me know what else I can do to fix up this pull request.

@drmingdrmer
Copy link
Member

@Xuanwo, @BohuTANG & @drmingdrmer - thanks for the help! Let me know what else I can do to fix up this pull request.

Thank you. Most to this PR looks good enough to me. As I mentioned in a previous comment, A new version and a corresponding test case should be added:

In this comment: #16824 (review)

@Rowlandev
Copy link
Contributor Author

@drmingdrmer My apologies, using git on this repo with two different machines, thought I pushed up.

@drmingdrmer
Copy link
Member

@drmingdrmer My apologies, using git on this repo with two different machines, thought I pushed up.

Does not matter :D

I'll receive a notification once there is a new push.

@Rowlandev
Copy link
Contributor Author

Is it party time? @drmingdrmer

@drmingdrmer
Copy link
Member

Is it party time? @drmingdrmer

Not quite yet - still have some work to wrap up, though I'm flexible with the schedule:)

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 13 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Rowlandev and @Xuanwo)


src/meta/proto-conv/tests/it/v111_add_glue_as_iceberg_catalog_option.rs line 52 at r4 (raw file):

        drop_on: None,
        gc_in_progress: true,
    };

This test should build an instance of IcebergCatalogOption` containing the new type to ensure it can be encoded and decoded correctly.

Code quote:

    let want = || mt::DatabaseMeta {
        engine: "44".to_string(),
        engine_options: btreemap! {s("abc") => s("def")},
        options: btreemap! {s("xyz") => s("foo")},
        created_on: Utc.with_ymd_and_hms(2014, 11, 28, 12, 0, 9).unwrap(),
        updated_on: Utc.with_ymd_and_hms(2014, 11, 29, 12, 0, 9).unwrap(),
        comment: "foo bar".to_string(),
        drop_on: None,
        gc_in_progress: true,
    };

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with one minor issue noted below:

Reviewed all commit messages.
Reviewable status: 14 of 15 files reviewed, 2 unresolved discussions (waiting on @Rowlandev and @Xuanwo)


src/meta/proto-conv/tests/it/v111_add_glue_as_iceberg_catalog_option.rs line 49 at r5 (raw file):

                address: "http://127.0.0.1:9900".to_string(),
                warehouse: "s3://my_bucket".to_string(),
                props: Default::default(),

Add several sample values to props to ensure it is actually encoded and decoded

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Rowlandev and @Xuanwo)

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Rowlandev and @Xuanwo)

Copy link
Contributor Author

@Rowlandev Rowlandev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 13 files at r1, 2 of 3 files at r2, 2 of 2 files at r3, 1 of 2 files at r4, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Xuanwo)

src/query/storages/iceberg/src/catalog.rs Outdated Show resolved Hide resolved
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Xuanwo)

src/query/storages/iceberg/src/catalog.rs Outdated Show resolved Hide resolved
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Xuanwo)

@drmingdrmer
Copy link
Member

It looks like the source code is not well formatted, which caused the check fails:
https://github.com/databendlabs/databend/actions/runs/11857098771/job/33044752102?pr=16824

You can run cargo fmt locally to fix this issue.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r8, 1 of 1 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Xuanwo)

@drmingdrmer drmingdrmer merged commit ec9b077 into databendlabs:main Nov 16, 2024
74 of 75 checks passed
@drmingdrmer
Copy link
Member

Nice job! Thank you! Let's merge!

@BohuTANG
Copy link
Member

@soyeric128 Please update the documentation. Thanks.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 26, 2024

Hi, @Rowlandev, thank you so much for creating this. Could you provide a real example of creating a Glue catalog (with the secrets redacted)?

@Rowlandev
Copy link
Contributor Author

@Xuanwo Yes sir. This relies on environment variables being set in the contained environment:

CREATE CATALOG my_aws_glue_catalog
TYPE = ICEBERG
CONNECTION = (
    TYPE = 'glue',
    ADDRESS = 'https://glue.us-east-1.amazonaws.com',
    WAREHOUSE = 's3://your-s3-data'
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants