Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/table iceberg #13

Closed
wants to merge 14 commits into from
Closed

Feature/table iceberg #13

wants to merge 14 commits into from

Conversation

nicor88
Copy link
Contributor

@nicor88 nicor88 commented Nov 9, 2022

What

porting of Tomme/dbt-athena#135.

For reference adding the description from the original PR.

As iceberg doesn't support CTA, the implementation do the following:

  • create tmp table as parquet
  • drop the old table, if exist
  • create iceberg table based on tmp table definition, only metadata
  • insert into from tmp
  • drop tmp table

Notes

  • adapter.drop_relation doesn't work with iceberg, a drop statement of iceberg table lead to deleting the data in S3 automatically in the specified location
  • table properties are yet not supported, easy to add later on.) Added in this PR, see test examples.
  • It's possible to enable an unique location to your table adding this config
{{ config(
    external_location='s3://bucket/example_iceberg/',
    strict_location=False
) }}

Doing so, the adapter add a unique uuid to the final table location, that is help-full in case of rename statement (e.g. you want to promote the table to your table used by analyst/reporting, after some running some dbt tests), to avoid collision when the table is recreated. It's possible to disable such behaviour using strict_location=True, that is the default.

Models used to test

Without partitions

{{ config(
    materialized='table',
    format='iceberg'
) }}


SELECT 'A' AS user_id, 'pi' AS name, 'active' AS status, 17.89 AS amount, 1 as quantity
UNION ALL
SELECT 'B' AS user_id, 'sh' AS name, 'active' AS status, 1 AS amount, 10000 as quantity
UNION ALL
SELECT 'C' AS user_id, 'zh' AS name, 'not_active' AS status, 20.54 AS amount, 340000 as quantity

With partitions

{{ config(
    materialized='table',
    format='iceberg',
    partitioned_by=['status']
) }}


SELECT 'A' AS user_id, 'pi' AS name, 'active' AS status, 17.89 AS amount, 1 as quantity
UNION ALL
SELECT 'B' AS user_id, 'sh' AS name, 'active' AS status, 1 AS amount, 10000 as quantity
UNION ALL
SELECT 'C' AS user_id, 'zh' AS name, 'not_active' AS status, 20.54 AS amount, 340000 as quantity

With external location

{{ config(
    materialized='table',
    format='iceberg',
    external_location='s3://my_bucket/my_table/'
) }}


SELECT 'A' AS user_id, 'pi' AS name, 'active' AS status, 17.89 AS amount, 1 as quantity
UNION ALL
SELECT 'B' AS user_id, 'sh' AS name, 'active' AS status, 1 AS amount, 10000 as quantity
UNION ALL
SELECT 'C' AS user_id, 'zh' AS name, 'not_active' AS status, 20.54 AS amount, 340000 as quantity

With different data types

{{ config(
    materialized='table',
    format='iceberg',
    partitioned_by=['status']
) }}


SELECT
	'A' AS user_id,
	'pi' AS name,
	'active' AS status,
	17.89 AS cost,
	1 AS quantity,
	100000000 AS quantity_big,
	current_date AS my_date,
	cast(current_timestamp as timestamp) AS my_timestamp

Table properties

{{ config(
    materialized='table',
    format='iceberg',
    partitioned_by=['status'],
    table_properties={
    	'write_target_data_file_size_bytes': '134217728',
    	'optimize_rewrite_delete_file_threshold': '2'
    	}
) }}


SELECT
	'A' AS user_id,
	'pi' AS name,
	'active' AS status,
	17.89 AS cost,
	1 AS quantity,
	100000000 AS quantity_big,
	current_date AS my_date,
	cast(current_timestamp as timestamp) AS my_timestamp

Not strict location

{{ config(
    materialized='table',
    format='iceberg',
    partitioned_by=['status'],
    external_location='s3://my_bucket/silver_athena/example_iceberg/',
    strict_location=False,
    table_properties={
    	'optimize_rewrite_delete_file_threshold': '2'
    	}
) }}

SELECT
	'A' AS user_id,
	'pi' AS name,
	'active' AS status,
	17.89 AS cost,
	1 AS quantity,
	100000000 AS quantity_big,
	current_date AS my_date,
	cast(current_timestamp as timestamp) AS my_timestamp

@nicor88 nicor88 added the enhancement New feature or request label Nov 9, 2022
@nicor88 nicor88 deleted the branch dbt-labs:release/1.0.4 November 15, 2022 07:50
@nicor88 nicor88 closed this Nov 15, 2022
@ignacioreyna
Copy link

ignacioreyna commented Nov 15, 2022

Was this PR closed by mistake, as mentioned in PR #4 ?

@nicor88
Copy link
Contributor Author

nicor88 commented Nov 15, 2022

@ignacioreyna I will reopen another one without the fork adding some tweaks

@nicor88 nicor88 deleted the feature/table_iceberg branch November 18, 2022 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants