-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'duckdb:main' into wvlet
- Loading branch information
Showing
12 changed files
with
205 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
extension: | ||
name: chsql_native | ||
description: ClickHouse Native File reader for chsql | ||
version: 0.0.1 | ||
language: Rust | ||
build: cmake | ||
license: MIT | ||
excluded_platforms: "windows_amd64_rtools;windows_amd64;wasm_threads;wasm_eh;wasm_mvp" | ||
requires_toolchains: "rust;python3" | ||
maintainers: | ||
- lmangani | ||
- adubovikov | ||
|
||
repo: | ||
github: quackscience/duckdb-extension-clickhouse-native | ||
ref: 0116eb462ec85fa000f1cb15a3b0ee6165711b78 | ||
|
||
docs: | ||
hello_world: | | ||
--- This experimental rust extension allows reading ClickHouse Native files with DuckDB | ||
--- Test files can be generated with clickhouse-local. See README for full examples. | ||
--- Simple Example | ||
D SELECT * FROM clickhouse_native('/tmp/numbers.clickhouse'); | ||
┌──────────────┬─────────┐ | ||
│ version() │ number │ | ||
│ varchar │ int32 │ | ||
├──────────────┼─────────┤ | ||
│ 24.12.1.1273 │ 0 │ | ||
└──────────────┴─────────┘ | ||
--- Long Example | ||
D SELECT count(*), max(number) FROM clickhouse_native('/tmp/100000.clickhouse'); | ||
┌──────────────┬─────────────┐ | ||
│ count_star() │ max(number) │ | ||
│ int64 │ int32 │ | ||
├──────────────┼─────────────┤ | ||
│ 100000 │ 99999 │ | ||
└──────────────┴─────────────┘ | ||
--- Wide Example | ||
D SELECT * FROM clickhouse_native('/tmp/functions.clickhouse') WHERE alias_to != '' LIMIT 10; | ||
┌────────────────────┬──────────────┬──────────────────┬──────────────────────┬──────────────┬─────────┬───┬─────────┬───────────┬────────────────┬──────────┬────────────┐ | ||
│ name │ is_aggregate │ case_insensitive │ alias_to │ create_query │ origin │ … │ syntax │ arguments │ returned_value │ examples │ categories │ | ||
│ varchar │ int32 │ int32 │ varchar │ varchar │ varchar │ │ varchar │ varchar │ varchar │ varchar │ varchar │ | ||
├────────────────────┼──────────────┼──────────────────┼──────────────────────┼──────────────┼─────────┼───┼─────────┼───────────┼────────────────┼──────────┼────────────┤ | ||
│ connection_id │ 0 │ 1 │ connectionID │ │ System │ … │ │ │ │ │ │ | ||
│ rand32 │ 0 │ 0 │ rand │ │ System │ … │ │ │ │ │ │ | ||
│ INET6_ATON │ 0 │ 1 │ IPv6StringToNum │ │ System │ … │ │ │ │ │ │ | ||
│ INET_ATON │ 0 │ 1 │ IPv4StringToNum │ │ System │ … │ │ │ │ │ │ | ||
│ truncate │ 0 │ 1 │ trunc │ │ System │ … │ │ │ │ │ │ | ||
│ ceiling │ 0 │ 1 │ ceil │ │ System │ … │ │ │ │ │ │ | ||
│ replace │ 0 │ 1 │ replaceAll │ │ System │ … │ │ │ │ │ │ | ||
│ from_utc_timestamp │ 0 │ 1 │ fromUTCTimestamp │ │ System │ … │ │ │ │ │ │ | ||
│ mapFromString │ 0 │ 0 │ extractKeyValuePairs │ │ System │ … │ │ │ │ │ │ | ||
│ str_to_map │ 0 │ 1 │ extractKeyValuePairs │ │ System │ … │ │ │ │ │ │ | ||
├────────────────────┴──────────────┴──────────────────┴──────────────────────┴──────────────┴─────────┴───┴─────────┴───────────┴────────────────┴──────────┴────────────┤ | ||
│ 10 rows 12 columns (11 shown) │ | ||
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ | ||
extended_description: | | ||
This extension is highly experimental and potentially unstable. All reads are full-scans. Do not use in production. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
docs: | ||
extended_description: | | ||
This extension provides an interface to the [Apache DataSketches](https://datasketches.apache.org/) library. This extension enables users to efficiently compute approximate results for large datasets directly within DuckDB, using state-of-the-art streaming algorithms for distinct counting, quantile estimation, and more. | ||
## Why use this extension? | ||
DuckDB already has great implementations of [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) via `approx_count_distinct(x)` and [TDigest](https://arxiv.org/abs/1902.04023) via `approx_quantile(x, pos)`, but it doesn't expose the internal state of the aggregates nor allow the the user to tune all of the parameters of the sketches. This extension allows data sketches to be serialized as `BLOB`s which can be stored and shared across different systems, processes, and environments without loss of fidelity. This makes data sketches highly useful in distributed data processing pipelines. | ||
This extension has implemented these sketches from Apache DataSketches. | ||
- Quantile Estimation | ||
- [TDigest](https://datasketches.apache.org/docs/tdigest/tdigest.html) | ||
- [Classic Quantile](https://datasketches.apache.org/docs/Quantiles/ClassicQuantilesSketch.html) | ||
- [Relative Error Quantile (REQ)](https://datasketches.apache.org/docs/REQ/ReqSketch.html) | ||
- [KLL](https://datasketches.apache.org/docs/KLL/KLLSketch.html) | ||
- Approximate Distinct Count | ||
- [Compressed Probability Counting (CPC)](https://datasketches.apache.org/docs/CPC/CpcSketches.html) | ||
- [HyperLogLog (HLL)](https://datasketches.apache.org/docs/HLL/HllSketches.html) | ||
For more information and information regarding usage, see the [README](https://github.com/rustyconover/duckdb-datasketches). | ||
hello_world: | | ||
-- This is just a demonstration of a single sketch type, | ||
-- see the README for more sketches. | ||
-- | ||
-- Lets simulate a temperature sensor | ||
CREATE TABLE readings(temp integer); | ||
INSERT INTO readings(temp) select unnest(generate_series(1, 10)); | ||
-- Create a sketch by aggregating id over the readings table. | ||
SELECT datasketch_tdigest_rank(datasketch_tdigest(10, temp), 5) from readings; | ||
┌────────────────────────────────────────────────────────────┐ | ||
│ datasketch_tdigest_rank(datasketch_tdigest(10, "temp"), 5) │ | ||
│ double │ | ||
├────────────────────────────────────────────────────────────┤ | ||
│ 0.45 │ | ||
└────────────────────────────────────────────────────────────┘ | ||
-- Put some more readings in at the high end. | ||
INSERT INTO readings(temp) values (10), (10), (10), (10); | ||
-- Now the rank of 5 is moved down. | ||
SELECT datasketch_tdigest_rank(datasketch_tdigest(10, temp), 5) from readings; | ||
┌────────────────────────────────────────────────────────────┐ | ||
│ datasketch_tdigest_rank(datasketch_tdigest(10, "temp"), 5) │ | ||
│ double │ | ||
├────────────────────────────────────────────────────────────┤ | ||
│ 0.32142857142857145 │ | ||
└────────────────────────────────────────────────────────────┘ | ||
-- Lets get the cumulative distribution function from the sketch. | ||
SELECT datasketch_tdigest_cdf(datasketch_tdigest(10, temp), [1,5,9]) from readings; | ||
┌──────────────────────────────────────────────────────────────────────────────────┐ | ||
│ datasketch_tdigest_cdf(datasketch_tdigest(10, "temp"), main.list_value(1, 5, 9)) │ | ||
│ double[] │ | ||
├──────────────────────────────────────────────────────────────────────────────────┤ | ||
│ [0.03571428571428571, 0.32142857142857145, 0.6071428571428571, 1.0] │ | ||
└──────────────────────────────────────────────────────────────────────────────────┘ | ||
-- The sketch can be persisted and updated later when more data | ||
-- arrives without having to rescan the previously aggregated data. | ||
SELECT datasketch_tdigest(10, temp) from readings; | ||
datasketch_tdigest(10, "temp") = \x02\x01\x14\x0A\x00\x04\x00... | ||
extension: | ||
build: cmake | ||
description: By utilizing the Apache DataSketches library this extension can efficiently compute approximate distinct item counts and estimations of quantiles, while allowing the sketches to be serialized. | ||
language: C++ | ||
license: MIT | ||
maintainers: | ||
- rustyconover | ||
name: datasketches | ||
version: 0.0.1 | ||
repo: | ||
github: rustyconover/duckdb-datasketches | ||
ref: 4568aa6b47fc8a2339f96287d1f165ae41fed982 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
function,description,comment,example | ||
"crypto_hash","Apply a cryptographic hash function specified as the first argument to the data supplied as the second argument.","","SELECT crypto_hash('md5', 'test');" | ||
"crypto_hmac","Calculate a HMAC value","","SELECT crypto_hmac('sha2-256', 'secret key', 'secret message');" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.