-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](iceberg)Supports using rest
type catalog to read tables in unity catalog
#43525
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -259,6 +259,10 @@ std::vector<tparquet::KeyValue> ParquetReader::get_metadata_key_values() { | |||
return _t_metadata->key_value_metadata; | |||
} | |||
|
|||
const FieldDescriptor ParquetReader::get_file_metadata_schema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const std::doris::vectorized::FieldDescriptor' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const FieldDescriptor ParquetReader::get_file_metadata_schema() { | |
FieldDescriptor ParquetReader::get_file_metadata_schema() { |
be/src/vec/exec/format/parquet/vparquet_reader.h:151:
- const FieldDescriptor get_file_metadata_schema();
+ FieldDescriptor get_file_metadata_schema();
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
icebergCatalog = ((HMSExternalCatalog) key.catalog).getIcebergHiveCatalog(); | ||
Catalog icebergCatalog = ((HMSExternalCatalog) key.catalog).getIcebergHiveCatalog(); | ||
icebergTable = HiveMetaStoreClientHelper.ugiDoAs(((ExternalCatalog) key.catalog).getConfiguration(), | ||
() -> icebergCatalog.loadTable(TableIdentifier.of(key.dbName, key.tableName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should unify the interface, both using catalog.loadTable()
or using metadataOps.loadTable()
@@ -39,6 +39,7 @@ public abstract class IcebergExternalCatalog extends ExternalCatalog { | |||
public static final String ICEBERG_HADOOP = "hadoop"; | |||
public static final String ICEBERG_GLUE = "glue"; | |||
public static final String ICEBERG_DLF = "dlf"; | |||
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_server_catalog_name"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_server_catalog_name"; | |
public static final String EXTERNAL_SERVER_CATALOG_NAME = "external_catalog.name"; |
_has_schema_change = true; | ||
} | ||
} | ||
Status IcebergParquetReader::_gen_col_name_maps(FieldDescriptor field_desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status IcebergParquetReader::_gen_col_name_maps(FieldDescriptor field_desc) { | |
Status IcebergParquetReader::_gen_col_name_maps(const FieldDescriptor& field_desc) { |
@@ -149,6 +149,7 @@ class ParquetReader : public GenericReader { | |||
const std::unordered_map<std::string, VExprContextSPtr>& missing_columns) override; | |||
|
|||
std::vector<tparquet::KeyValue> get_metadata_key_values(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be removed?
@@ -218,7 +218,7 @@ class IcebergParquetReader final : public IcebergTableReader { | |||
parquet_reader->set_delete_rows(&_iceberg_delete_rows); | |||
} | |||
|
|||
Status _gen_col_name_maps(std::vector<tparquet::KeyValue> parquet_meta_kv); | |||
Status _gen_col_name_maps(FieldDescriptor field_desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need also modify this method in IcebergOrcReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orc format is originally associated according to the id, so there is no need to modify it.
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -147,6 +150,14 @@ Status FieldDescriptor::parse_from_thrift(const std::vector<tparquet::SchemaElem | |||
return Status::OK(); | |||
} | |||
|
|||
const doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const doris::Slice' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { | |
doris::Slice FieldDescriptor::get_column_name_from_field_id(int32_t id) const { |
be/src/vec/exec/format/parquet/schema_desc.h:137:
- const doris::Slice get_column_name_from_field_id(int32_t id) const;
+ doris::Slice get_column_name_from_field_id(int32_t id) const;
// This is for iceberg schema evolution. | ||
std::vector<tparquet::KeyValue> ParquetReader::get_metadata_key_values() { | ||
return _t_metadata->key_value_metadata; | ||
const FieldDescriptor ParquetReader::get_file_metadata_schema() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const std::doris::vectorized::FieldDescriptor' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
const FieldDescriptor ParquetReader::get_file_metadata_schema() { | |
FieldDescriptor ParquetReader::get_file_metadata_schema() { |
be/src/vec/exec/format/parquet/vparquet_reader.h:150:
- const FieldDescriptor get_file_metadata_schema();
+ FieldDescriptor get_file_metadata_schema();
run buildall |
rest
type catalog to read tables in unity catalogrest
type catalog to read tables in unity catalog
run buildall |
e9f2152
to
705354f
Compare
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
be/src/io/file_factory.cpp
Outdated
final_file = final_file.substr(5); | ||
} | ||
RETURN_IF_ERROR_RESULT(io::global_local_filesystem()->open_file(final_file, &file_reader, | ||
&reader_options)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file:: prefix is duplicated with type? Maybe we should parse when get type and check if they are same.
} | ||
|
||
// Parse URI authority, if any | ||
if (path_str.compare(start, 2, "//") == 0 && path_str.length() - start > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add example here
run buildall |
TPC-H: Total hot run time: 40240 ms
|
TPC-DS: Total hot run time: 197270 ms
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 32.87 s
|
run feut |
PR approved by at least one committer and no changes requested. |
…nity catalog (apache#43525) ### What problem does this PR solve? 1. We now support using the `rest` type catalog to read tables in the unity catalog (https://github.com/unitycatalog/unitycatalog). 2. When reading the parquet file on the be side, we find the corresponding column name based on the column id, which naturally supports the column rename function. example: ``` CREATE CATALOG `uc3` PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "rest", "uri" = "http://127.0.0.1:8080/api/2.1/unity-catalog/iceberg", "external_catalog.name" = "unity" --- catalog name in unity catalog ); ```
What problem does this PR solve?
rest
type catalog to read tables in the unity catalog (https://github.com/unitycatalog/unitycatalog).example:
Release Note
feat support iceberg unity catalog
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)