Skip to content

quackscience/duckdb-extension-clickhouse-native

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

54 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DuckDB Clickhouse Native Extension for chsql

Experimental ClickHouse Native Client and Native file reader for DuckDB chsql

๐Ÿ“ฆ Installation

INSTALL chsql_native FROM community;
LOAD chsql_native;

๐Ÿค– Native Client

The extension provides an experimental clickhouse native client: clickhouse_scan

๐Ÿ Settings

# Local Setup, Insecure
export CLICKHOUSE_URL="tcp://localhost:9000"
# Remote Setup, Secure
export CLICKHOUSE_URL="tcp://user:pass@remote:9440/?secure=true&skip_verify=true"

โœ๏ธ Usage

D SELECT * FROM clickhouse_scan("SELECT version(), 'hello', 123");
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ version()  โ”‚ 'hello' โ”‚  123   โ”‚
โ”‚  varchar   โ”‚ varchar โ”‚ uint32 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 24.10.2.80 โ”‚ hello   โ”‚    123 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿค– Native Reader

The extension provides an experimental clickhouse native file reader: clickhouse_native

๐Ÿ Input

Generate some native files with clickhouse-local or clickhouse-server

--- simple w/ one row, two columns
SELECT version(), number FROM numbers(1) INTO OUTFILE '/tmp/numbers.clickhouse' FORMAT Native;
--- simple w/ one column, 100000 rows
SELECT number FROM numbers(100000) INTO OUTFILE '/tmp/100000.clickhouse' FORMAT Native;
--- complex w/ multiple types
SELECT * FROM system.functions LIMIT 10 INTO OUTFILE '/tmp/functions.clickhouse' FORMAT Native;

โœ๏ธ Usage

Read ClickHouse Native files with DuckDB. Reads are full-scans at this time.

D SELECT * FROM clickhouse_native('/tmp/numbers.clickhouse');
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  version()   โ”‚ number  โ”‚
โ”‚   varchar    โ”‚  int32  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 24.12.1.1273 โ”‚ 0       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D SELECT count(*), max(number) FROM clickhouse_native('/tmp/100000.clickhouse');
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ count_star() โ”‚ max(number) โ”‚
โ”‚    int64     โ”‚    int32    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚       100000 โ”‚       99999 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
D SELECT * FROM clickhouse_native('/tmp/functions.clickhouse') WHERE alias_to != '' LIMIT 10;
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        name        โ”‚ is_aggregate โ”‚ case_insensitive โ”‚       alias_to       โ”‚ create_query โ”‚ origin  โ”‚ โ€ฆ โ”‚ syntax  โ”‚ arguments โ”‚ returned_value โ”‚ examples โ”‚ categories โ”‚
โ”‚      varchar       โ”‚    int32     โ”‚      int32       โ”‚       varchar        โ”‚   varchar    โ”‚ varchar โ”‚   โ”‚ varchar โ”‚  varchar  โ”‚    varchar     โ”‚ varchar  โ”‚  varchar   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ connection_id      โ”‚            0 โ”‚                1 โ”‚ connectionID         โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ rand32             โ”‚            0 โ”‚                0 โ”‚ rand                 โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ INET6_ATON         โ”‚            0 โ”‚                1 โ”‚ IPv6StringToNum      โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ INET_ATON          โ”‚            0 โ”‚                1 โ”‚ IPv4StringToNum      โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ truncate           โ”‚            0 โ”‚                1 โ”‚ trunc                โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ ceiling            โ”‚            0 โ”‚                1 โ”‚ ceil                 โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ replace            โ”‚            0 โ”‚                1 โ”‚ replaceAll           โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ from_utc_timestamp โ”‚            0 โ”‚                1 โ”‚ fromUTCTimestamp     โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ mapFromString      โ”‚            0 โ”‚                0 โ”‚ extractKeyValuePairs โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”‚ str_to_map         โ”‚            0 โ”‚                1 โ”‚ extractKeyValuePairs โ”‚              โ”‚ System  โ”‚ โ€ฆ โ”‚         โ”‚           โ”‚                โ”‚          โ”‚            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 10 rows                                                                                                                                           12 columns (11 shown) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Notes

The reader is a clear room implementation of the ClickHouse Native file format using no code or libraries from ClickHouse Inc. As such it is potentially incomplete, imperfect and might not be compatible with all files. USE AT YOUR OWN RISK!

๐ŸŽ Performance

Simple CLI cold start count() test using duckdb vs. clickhouse-local and 1M rows

DuckDB

# time duckdb -c "LOAD chsql_native; SELECT count(*) FROM clickhouse_native('/tmp/1M.clickhouse');"
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ count_star() โ”‚
โ”‚    int64     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚      1000000 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

real	0m0.095s
user	0m0.077s
sys	0m0.029s

clickhouse-local

# time clickhouse local "SELECT count(*) FROM '/tmp/1M.clickhouse'";
1000000

real	0m0.141s
user	0m0.086s
sys	0m0.043s

โ›‘๏ธ Extension Status

  • Basic Fomat Reading
    • Column Extraction
    • Blocks Parser & Iterator
    • Type Mapping WIP
      • Strings
      • Integers
      • Enums
      • ??? as String
    • Compression support
  • Basic Native Client
    • clickhouse-rs binding
    • TLS Support
    • Type Mapping WIP
      • Strings
      • Integers
      • Everything Else

โš™๏ธ Dev Build

You can easily modify the code and build a local extension for testing and development.

Requirements

  • Rust
  1. Clone and Compile the extension on your system
cd /usr/src
git clone --recurse-submodules https://github.com/quackscience/duckdb-extension-clickhouse-native
cd duckdb-extension-clickhouse-native
make configure && make
  1. Download and Run DuckDB with -unsigned
wget https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip && unzip duckdb_cli-linux-amd64.zip
./duckdb -unsigned
  1. Load your local extension build
D LOAD '/usr/src/duckdb-extension-clickhouse-native/build/debug/clickhouse_native.duckdb_extension';

Disclaimer

DuckDB ยฎ is a trademark of DuckDB Foundation. ClickHouseยฎ is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names is purely informative or intended as parody and does not imply endorsement, affiliation, or association with the respective owners.