v0.1.1

nsthorat released this 26 Sep 00:08

· 398 commits to main since this release

4d20423

Overview

Embedding computation can now be larger-than-RAM! Computing lots of embeddings will iteratively write to a vector store.
JSON and CSV sources are heavily optimized and go through duckdb for parsing.
Clustering now supports semantic clustering with embeddings, using DBScan.

New features

Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
Add a dict source and convert LangSmith source to use it by @dsmilkov in #716
Add clustering signal by @dsmilkov in #711

Performance

Use iterables for compute_signal and compute_embedding. by @nsthorat in #706
Write embeddings to the vector store iteratively by @nsthorat in #709
Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
Speed up the docker image build step by installing lilac from pip before installing the local wheel. by @nsthorat in #714
Improve perf of server by removing UUID sort by @dsmilkov in #715

Bug fixes

Fix semantic search on repeated by @dsmilkov in #704
Fix syntax error with keyword search by @dsmilkov in #705
Fix bug with span highlighting a repeated field by @nsthorat in #713
Change the bootup load to be during the new FastAPI lifecycle API. by @nsthorat in #717

Full Changelog: v0.1.0...v0.1.1

Contributors

nsthorat and dsmilkov

Assets 3