Skip to content

Commit

Permalink
Merge branch 'duckdb:main' into wvlet
Browse files Browse the repository at this point in the history
  • Loading branch information
lmangani authored Jan 1, 2025
2 parents 29f9f28 + 7f82a6c commit c9cbde8
Show file tree
Hide file tree
Showing 12 changed files with 205 additions and 36 deletions.
8 changes: 4 additions & 4 deletions extensions/avro/description.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
extension:
name: avro
description: Read Apache Avro (TM) files form DuckDB
version: 1.0.0
description: Read Apache Avro (TM) files from DuckDB
version: 1.1.0
language: C++
build: cmake
license: MIT
Expand All @@ -12,11 +12,11 @@ extension:

repo:
github: hannes/duckdb_avro
ref: 7facc0badf31c7ec0a249cf47fb97d190550d3f2
ref: e5ed59b6ccf915c65e17eb6286b9a64f3ab09f59

docs:
hello_world: |
FROM read_avro('some_file.avro');
extended_description: |
This extension provides a scan function for Apache Avro files.
For more information and information regarding usage, limitations and performance, see the [README](https://github.com/hannes/duckdb_avro).
For more information and information regarding usage, limitations and performance, see the [README](https://github.com/hannes/duckdb_avro) and the [announcement blog post](https://duckdb.org/2024/12/09/duckdb-avro-extension).
62 changes: 62 additions & 0 deletions extensions/chsql_native/description.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
extension:
name: chsql_native
description: ClickHouse Native File reader for chsql
version: 0.0.1
language: Rust
build: cmake
license: MIT
excluded_platforms: "windows_amd64_rtools;windows_amd64;wasm_threads;wasm_eh;wasm_mvp"
requires_toolchains: "rust;python3"
maintainers:
- lmangani
- adubovikov

repo:
github: quackscience/duckdb-extension-clickhouse-native
ref: 0116eb462ec85fa000f1cb15a3b0ee6165711b78

docs:
hello_world: |
--- This experimental rust extension allows reading ClickHouse Native files with DuckDB
--- Test files can be generated with clickhouse-local. See README for full examples.
--- Simple Example
D SELECT * FROM clickhouse_native('/tmp/numbers.clickhouse');
┌──────────────┬─────────┐
│ version() │ number │
│ varchar │ int32 │
├──────────────┼─────────┤
│ 24.12.1.1273 │ 0 │
└──────────────┴─────────┘
--- Long Example
D SELECT count(*), max(number) FROM clickhouse_native('/tmp/100000.clickhouse');
┌──────────────┬─────────────┐
│ count_star() │ max(number) │
│ int64 │ int32 │
├──────────────┼─────────────┤
│ 100000 │ 99999 │
└──────────────┴─────────────┘
--- Wide Example
D SELECT * FROM clickhouse_native('/tmp/functions.clickhouse') WHERE alias_to != '' LIMIT 10;
┌────────────────────┬──────────────┬──────────────────┬──────────────────────┬──────────────┬─────────┬───┬─────────┬───────────┬────────────────┬──────────┬────────────┐
│ name │ is_aggregate │ case_insensitive │ alias_to │ create_query │ origin │ … │ syntax │ arguments │ returned_value │ examples │ categories │
│ varchar │ int32 │ int32 │ varchar │ varchar │ varchar │ │ varchar │ varchar │ varchar │ varchar │ varchar │
├────────────────────┼──────────────┼──────────────────┼──────────────────────┼──────────────┼─────────┼───┼─────────┼───────────┼────────────────┼──────────┼────────────┤
│ connection_id │ 0 │ 1 │ connectionID │ │ System │ … │ │ │ │ │ │
│ rand32 │ 0 │ 0 │ rand │ │ System │ … │ │ │ │ │ │
│ INET6_ATON │ 0 │ 1 │ IPv6StringToNum │ │ System │ … │ │ │ │ │ │
│ INET_ATON │ 0 │ 1 │ IPv4StringToNum │ │ System │ … │ │ │ │ │ │
│ truncate │ 0 │ 1 │ trunc │ │ System │ … │ │ │ │ │ │
│ ceiling │ 0 │ 1 │ ceil │ │ System │ … │ │ │ │ │ │
│ replace │ 0 │ 1 │ replaceAll │ │ System │ … │ │ │ │ │ │
│ from_utc_timestamp │ 0 │ 1 │ fromUTCTimestamp │ │ System │ … │ │ │ │ │ │
│ mapFromString │ 0 │ 0 │ extractKeyValuePairs │ │ System │ … │ │ │ │ │ │
│ str_to_map │ 0 │ 1 │ extractKeyValuePairs │ │ System │ … │ │ │ │ │ │
├────────────────────┴──────────────┴──────────────────┴──────────────────────┴──────────────┴─────────┴───┴─────────┴───────────┴────────────────┴──────────┴────────────┤
│ 10 rows 12 columns (11 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
extended_description: |
This extension is highly experimental and potentially unstable. All reads are full-scans. Do not use in production.
77 changes: 77 additions & 0 deletions extensions/datasketches/description.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
docs:
extended_description: |
This extension provides an interface to the [Apache DataSketches](https://datasketches.apache.org/) library. This extension enables users to efficiently compute approximate results for large datasets directly within DuckDB, using state-of-the-art streaming algorithms for distinct counting, quantile estimation, and more.
## Why use this extension?
DuckDB already has great implementations of [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) via `approx_count_distinct(x)` and [TDigest](https://arxiv.org/abs/1902.04023) via `approx_quantile(x, pos)`, but it doesn't expose the internal state of the aggregates nor allow the the user to tune all of the parameters of the sketches. This extension allows data sketches to be serialized as `BLOB`s which can be stored and shared across different systems, processes, and environments without loss of fidelity. This makes data sketches highly useful in distributed data processing pipelines.
This extension has implemented these sketches from Apache DataSketches.
- Quantile Estimation
- [TDigest](https://datasketches.apache.org/docs/tdigest/tdigest.html)
- [Classic Quantile](https://datasketches.apache.org/docs/Quantiles/ClassicQuantilesSketch.html)
- [Relative Error Quantile (REQ)](https://datasketches.apache.org/docs/REQ/ReqSketch.html)
- [KLL](https://datasketches.apache.org/docs/KLL/KLLSketch.html)
- Approximate Distinct Count
- [Compressed Probability Counting (CPC)](https://datasketches.apache.org/docs/CPC/CpcSketches.html)
- [HyperLogLog (HLL)](https://datasketches.apache.org/docs/HLL/HllSketches.html)
For more information and information regarding usage, see the [README](https://github.com/rustyconover/duckdb-datasketches).
hello_world: |
-- This is just a demonstration of a single sketch type,
-- see the README for more sketches.
--
-- Lets simulate a temperature sensor
CREATE TABLE readings(temp integer);
INSERT INTO readings(temp) select unnest(generate_series(1, 10));
-- Create a sketch by aggregating id over the readings table.
SELECT datasketch_tdigest_rank(datasketch_tdigest(10, temp), 5) from readings;
┌────────────────────────────────────────────────────────────┐
│ datasketch_tdigest_rank(datasketch_tdigest(10, "temp"), 5) │
│ double │
├────────────────────────────────────────────────────────────┤
│ 0.45 │
└────────────────────────────────────────────────────────────┘
-- Put some more readings in at the high end.
INSERT INTO readings(temp) values (10), (10), (10), (10);
-- Now the rank of 5 is moved down.
SELECT datasketch_tdigest_rank(datasketch_tdigest(10, temp), 5) from readings;
┌────────────────────────────────────────────────────────────┐
│ datasketch_tdigest_rank(datasketch_tdigest(10, "temp"), 5) │
│ double │
├────────────────────────────────────────────────────────────┤
│ 0.32142857142857145 │
└────────────────────────────────────────────────────────────┘
-- Lets get the cumulative distribution function from the sketch.
SELECT datasketch_tdigest_cdf(datasketch_tdigest(10, temp), [1,5,9]) from readings;
┌──────────────────────────────────────────────────────────────────────────────────┐
│ datasketch_tdigest_cdf(datasketch_tdigest(10, "temp"), main.list_value(1, 5, 9)) │
│ double[] │
├──────────────────────────────────────────────────────────────────────────────────┤
│ [0.03571428571428571, 0.32142857142857145, 0.6071428571428571, 1.0] │
└──────────────────────────────────────────────────────────────────────────────────┘
-- The sketch can be persisted and updated later when more data
-- arrives without having to rescan the previously aggregated data.
SELECT datasketch_tdigest(10, temp) from readings;
datasketch_tdigest(10, "temp") = \x02\x01\x14\x0A\x00\x04\x00...
extension:
build: cmake
description: By utilizing the Apache DataSketches library this extension can efficiently compute approximate distinct item counts and estimations of quantiles, while allowing the sketches to be serialized.
language: C++
license: MIT
maintainers:
- rustyconover
name: datasketches
version: 0.0.1
repo:
github: rustyconover/duckdb-datasketches
ref: 4568aa6b47fc8a2339f96287d1f165ae41fed982
3 changes: 3 additions & 0 deletions extensions/datasketches/docs/function_descriptions.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function,description,comment,example
"crypto_hash","Apply a cryptographic hash function specified as the first argument to the data supplied as the second argument.","","SELECT crypto_hash('md5', 'test');"
"crypto_hmac","Calculate a HMAC value","","SELECT crypto_hmac('sha2-256', 'secret key', 'secret message');"
4 changes: 2 additions & 2 deletions extensions/duckpgq/description.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
extension:
name: duckpgq
description: Extension that adds support for SQL/PGQ and graph algorithms
version: 0.1.0
version: 0.2.1
language: C++
build: cmake
license: MIT
Expand All @@ -10,7 +10,7 @@ extension:

repo:
github: cwida/duckpgq-extension
ref: 5dcec704050b15b12a7894172089dd4a79ca1435
ref: bac137217a27d519fbcb4678f024393dc7027f37

docs:
hello_world: |
Expand Down
19 changes: 11 additions & 8 deletions extensions/flockmtl/description.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
extension:
name: flockmtl
description: DuckDB LLM Extension
version: 0.1.0
description: LLM & RAG extension to combine analytics and semantic analysis
version: 0.2.2
language: SQL & C++
build: cmake
license: MIT
Expand All @@ -11,15 +11,18 @@ extension:
- queryproc

repo:
github: dsg-polymtl/flockmtl-duckdb
ref: 1bd8ac0f54f8bf4c7da1c3793b88e73daa127653
github: dsg-polymtl/flockmtl
ref: c8cad4a66a0a62164b845258b6a00e72c8470b3f

docs:
hello_world: |
-- After loading, any function call will throw an error if an OPENAI_API_KEY environment variable is not set
-- After loading, any function call will throw an error if the provider's secret doesn't exist
-- Create your provider secret by following the [documentation](https://dsg-polymtl.github.io/flockmtl/docs/supported-providers). For example, you can create a default OpenAI API key as follows:
D CREATE SECRET (TYPE OPENAI, API_KEY 'your-api-key');
-- Call an OpenAI model with a predefined prompt ('Tell me hello world') and default model ('gpt-4o-mini')
D SELECT llm_complete('hello-world', 'default');
D SELECT llm_complete({'model_name': 'default'}, {'prompt_name': 'hello-world'});
┌──────────────────────────────────────────┐
│ llm_complete(hello_world, default_model) │
│ varchar │
Expand All @@ -35,10 +38,10 @@ docs:
D CREATE PROMPT('summarize', 'summarize the text into 1 word: {{text}}');
-- Create a variable name for the model to do the summarizing
D CREATE MODEL('summarizer-model', 'gpt-4o', 128000);
D CREATE MODEL('summarizer-model', 'gpt-4o', {'context_window': 128000, 'max_output_tokens': 16400);
-- Summarize text and pass it as parameter
D SELECT llm_complete('summarize', 'summarizer-model', {'text': 'We support more functions and approaches to combine relational analytics and semantic analysis. Check our repo for documentation and examples.'});
D SELECT llm_complete({'model_name': 'summarizer-model'}, {'prompt_name': 'summarize'}, {'text': 'We support more functions and approaches to combine relational analytics and semantic analysis. Check our repo for documentation and examples.'});
extended_description: |
This extension is experimental and potentially unstable. Do not use it in production.
4 changes: 2 additions & 2 deletions extensions/gsheets/description.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
extension:
name: gsheets
description: Read and write Google Sheets using SQL
version: 0.0.3
version: 0.0.4
language: C++
build: cmake
license: MIT
Expand All @@ -11,7 +11,7 @@ extension:

repo:
github: evidence-dev/duckdb_gsheets
ref: 5352ea30499a7f7f2dbfa45faf622906e0130cfb
ref: c4a8413fd1d1ca63cbe37db66a7676fe677da456

docs:
hello_world: |
Expand Down
4 changes: 2 additions & 2 deletions extensions/open_prompt/description.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
extension:
name: open_prompt
description: Interact with LLMs with a simple DuckDB Extension
version: 0.0.3
version: 0.0.4
language: C++
build: cmake
license: MIT
Expand All @@ -11,7 +11,7 @@ extension:

repo:
github: quackscience/duckdb-extension-openprompt
ref: 616bdfc4e7b01c4095a2dda8d4104c179922efd6
ref: e7e54de7cfc0bc61599c7ab018399508077202a5

docs:
hello_world: |
Expand Down
Loading

0 comments on commit c9cbde8

Please sign in to comment.