Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse MergeTree Support #1387

Closed
Pipboyguy opened this issue May 21, 2024 · 2 comments · Fixed by #1496
Closed

ClickHouse MergeTree Support #1387

Pipboyguy opened this issue May 21, 2024 · 2 comments · Fixed by #1496
Assignees
Labels
bug Something isn't working community This issue came from slack community workspace destination Issue related to new destinations

Comments

@Pipboyguy
Copy link
Collaborator

Pipboyguy commented May 21, 2024

Support for MergeTree Engine in ClickHouse Destination

Problem

The ClickHouse destination in dlt 0.4.11 currently only supports the ReplicatedMergeTree table engine. However, many users have requested support for the standard non-replicated MergeTree engine to enable local development and testing deployments.

Desired Behavior

The ClickHouse destination should correctly handle ENGINE=MergeTree and create the appropriate table structure in ClickHouse.

Proposed Solution

Enhance the ClickHouse destination adapter with the following:

  1. Automatically detect whether the connected ClickHouse instance is a Cloud instance (replicated) or a local self-managed instance.
    • For ClickHouse Cloud, default to ReplicatedMergeTree engine
    • For non-cloud instances, default to standard MergeTree
SELECT value
FROM system.settings
WHERE name = 'cloud_mode';
  1. Allow users to explicitly specify the desired engine (MergeTree or ReplicatedMergeTree) in the destination configuration. This will override the automatic detection.

  2. Properly interpolate the provided engine into the CREATE TABLE statement executed against ClickHouse.

Note: To limit scope, this enhancement will not include support for specifying replication, zookeeper or shard details for the ReplicatedMergeTree engine. Users requiring those customizations can continue to specify the full engine definition in their configuration.

Considerations

  • Thoroughly test the solution against both ClickHouse Cloud and various self-managed ClickHouse versions
  • Update documentation to explain the new behavior and configuration options
  • Consider any impacts to downstream systems or processes relying on the current hard-coded ReplicatedMergeTree behavior
@Pipboyguy Pipboyguy self-assigned this May 21, 2024
@Pipboyguy Pipboyguy added bug Something isn't working destination Issue related to new destinations community This issue came from slack community workspace labels May 21, 2024
@Pipboyguy Pipboyguy moved this from Todo to In Progress in dlt core library Jun 19, 2024
@Pipboyguy
Copy link
Collaborator Author

Pipboyguy commented Jun 19, 2024

The engine should be configurable via clickhouse_adapter:

clickhouse_adapter(data, table_engine_type="merge_tree")

@Pipboyguy
Copy link
Collaborator Author

Pipboyguy commented Jun 19, 2024

It seems that on ClickHouse Cloud, the ReplicatedMergeTree table engine family is automatically replaced by the SharedMergeTree engine family:

https://clickhouse.com/docs/en/cloud/reference/shared-merge-tree

This happened recently btw. This makes our life easier since just MergeTree engine will now work across both cloud and on-prem.

Self managed will still have the option of using ReplciatedMergeTree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community This issue came from slack community workspace destination Issue related to new destinations
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant