-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Open Source ClickHouse Deployments #1496
Conversation
Signed-off-by: Marcel Coetzee <[email protected]>
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
…onfiguration setting~ Signed-off-by: Marcel Coetzee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code itself looks good to me. Just two comments regarding test coverage and the use of sentinel tables.
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
# Conflicts: # dlt/destinations/impl/clickhouse/clickhouse.py # dlt/destinations/impl/clickhouse/sql_client.py # tests/load/clickhouse/test_clickhouse_adapter.py
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments, LGTM!
Just this comment regarding state interaction is still open: #1496 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pipboyguy @jorritsandbrink this is really good. thanks for working on it and all reviews. from my side
- pls keep sentinel table. I elaborated quite a bit why
- consider adding the table engine in the clickhouse configuration to have easy global swithc
…ides Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I have just one small suggestion which may not make sense. let's wait for @jorritsandbrink review and we can merge it
self.execute_sql(f""" | ||
CREATE TABLE {sentinel_table_name} | ||
(_dlt_id String NOT NULL PRIMARY KEY) | ||
ENGINE=MergeTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably nitpicking but should it use config.table_engine_type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rudolfix Thanks for the spot!
Changed to config table engine:
dlt/dlt/destinations/impl/clickhouse/sql_client.py
Lines 116 to 121 in 9c67c65
sentinel_table_type = cast(TTableEngineType, self.credentials.table_engine_type) | |
self.execute_sql(f""" | |
CREATE TABLE {sentinel_table_name} | |
(_dlt_id String NOT NULL PRIMARY KEY) | |
ENGINE={TABLE_ENGINE_TYPE_TO_CLICKHOUSE_ATTR.get(sentinel_table_type)} | |
COMMENT 'internal dlt sentinel table'""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks great, just a comment regarding organization of config attributes.
@@ -36,6 +40,8 @@ class ClickHouseCredentials(ConnectionStringCredentials): | |||
"""Timeout for sending and receiving data. Defaults to 300 seconds.""" | |||
dataset_table_separator: str = "___" | |||
"""Separator for dataset table names, defaults to '___', i.e. 'database.dataset___table'.""" | |||
table_engine_type: Optional[TTableEngineType] = "merge_tree" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you define this attribute under ClickHouseCredentials
?
I think it makes more sense to put it under ClickHouseClientConfiguration
. Same for dataset_table_separator
and dataset_sentinel_table_name
. They aren't credentials.
As a reference: synapse
has a default_table_index_type
that's similar to table_engine_type
, which is defined in the destination client class:
default_table_index_type: Optional[TTableIndexType] = "heap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pipboyguy yes I agree with @jorritsandbrink
this is probably due to sql_client not seeing config but credentials - but you can pass additional arguments to it
so my take is that we move it and if it is hard then we'll need to change how sql_client
is instantiated. I can help in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. These don't belong in credentials.
Please find latest changes in both code base and docs.
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
…iguration Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I like the typing
module.
Description
This will improves support for self-managed ClickHouse open source deployments, while maintaining compatibility with ClickHouse Cloud deployments.
table_engine_type
inclickhouse_adapter
. Valid types aremerge_tree
,replicated_merge_tree
,shared_merge_tree
,stripe_log
,tiny_log
MergeTree
if no engine is specified, which now works for both Cloud and self-managed deploymentsclickhouse_connect
sessions.Related Issues
Additional Context
Note that this change does not include support for specifying replication, ZooKeeper or shard details for the
ReplicatedMergeTree
engine. Users requiring those customizations can continue to specify the full engine definition in their configuration.