Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sparse vectors #40

Open
falling-springs opened this issue Dec 2, 2024 · 1 comment
Open

Add support for sparse vectors #40

falling-springs opened this issue Dec 2, 2024 · 1 comment

Comments

@falling-springs
Copy link

Hi there -- first of all, this extension is fantastic and I am very excited about using DuckDB as a vector search solution. Thanks for your hard work!

One currently missing feature I'd love to see added is support for "sparse" vectors (e.g. SPLADE, BGE-M3, BM25), which would support hybrid search use cases.

This is a feature that is supported by a lot of the major providers (pg_vector, Pinecone, Qdrant, Milvus, etc.).

Is it on the roadmap for you all? I think even providing a standardized sparse similarity function would be great, even if building fast indices, etc. proves more challenging.

@falling-springs
Copy link
Author

For anyone else considering DuckDB for sparse vectors, it is possible to do some things with just the built in list and map functions.

The macro below, while quite ugly, is actually reasonably performant. It assumes you are storing your sparse vectors in a MAP column as {index:value} pairs.

CREATE OR REPLACE MACRO map_dot_product(map1, map2) AS (
    list_dot_product(
        flatten(list_transform(
            list_intersect(map_keys(map1), map_keys(map2)),
            x -> map_extract(map1, x)
        ))::DOUBLE[],
        flatten(list_transform(
            list_intersect(map_keys(map1), map_keys(map2)),
            x -> map_extract(map2, x)
        ))::DOUBLE[]
    )
);

Suggestions/improvements welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant