You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there -- first of all, this extension is fantastic and I am very excited about using DuckDB as a vector search solution. Thanks for your hard work!
One currently missing feature I'd love to see added is support for "sparse" vectors (e.g. SPLADE, BGE-M3, BM25), which would support hybrid search use cases.
This is a feature that is supported by a lot of the major providers (pg_vector, Pinecone, Qdrant, Milvus, etc.).
Is it on the roadmap for you all? I think even providing a standardized sparse similarity function would be great, even if building fast indices, etc. proves more challenging.
The text was updated successfully, but these errors were encountered:
For anyone else considering DuckDB for sparse vectors, it is possible to do some things with just the built in list and map functions.
The macro below, while quite ugly, is actually reasonably performant. It assumes you are storing your sparse vectors in a MAP column as {index:value} pairs.
CREATE OR REPLACE MACRO map_dot_product(map1, map2) AS (
list_dot_product(
flatten(list_transform(
list_intersect(map_keys(map1), map_keys(map2)),
x -> map_extract(map1, x)
))::DOUBLE[],
flatten(list_transform(
list_intersect(map_keys(map1), map_keys(map2)),
x -> map_extract(map2, x)
))::DOUBLE[]
)
);
Hi there -- first of all, this extension is fantastic and I am very excited about using DuckDB as a vector search solution. Thanks for your hard work!
One currently missing feature I'd love to see added is support for "sparse" vectors (e.g. SPLADE, BGE-M3, BM25), which would support hybrid search use cases.
This is a feature that is supported by a lot of the major providers (pg_vector, Pinecone, Qdrant, Milvus, etc.).
Is it on the roadmap for you all? I think even providing a standardized sparse similarity function would be great, even if building fast indices, etc. proves more challenging.
The text was updated successfully, but these errors were encountered: