Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for binary embeddings #37

Open
bhugueney opened this issue Nov 11, 2024 · 2 comments
Open

Adding support for binary embeddings #37

bhugueney opened this issue Nov 11, 2024 · 2 comments

Comments

@bhugueney
Copy link

bhugueney commented Nov 11, 2024

Thank you for this most useful extension !
It seems that binary embeddings allow dramatic increase in performance for a small accuracy cost ( https://huggingface.co/blog/embedding-quantization#quantization-experiments ).
Various other vector DB allow to use them :

It would be great if DuckDB vss could also support them efficiently.

EDIT pg_vecor also has it : https://github.com/pgvector/pgvector?tab=readme-ov-file#binary-vectors
Best Regards

@Maxxen
Copy link
Member

Maxxen commented Nov 21, 2024

Hello!

Yes, support for arrays of other types is planned, although binary vectors in particular might be slightly more complex since DuckDB itself doesn't really have a "bit" type, but it should be doable.

@bhugueney
Copy link
Author

Hello!

Yes, support for arrays of other types is planned, although binary vectors in particular might be slightly more complex since DuckDB itself doesn't really have a "bit" type, but it should be doable.

Thank you for your interest.
I have high hopes for the binary vector as they provide considerable saving in memory and speed both because of data throughput (cf. memory) and simpler distance implementation
https://github.com/CountOnes/hamming_weight.
Best Regards.

Sorry, closed by mistake and couldn't see how to undo the closing.

@bhugueney bhugueney reopened this Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants