v0.1.1
Overview
- Embedding computation can now be larger-than-RAM! Computing lots of embeddings will iteratively write to a vector store.
- JSON and CSV sources are heavily optimized and go through duckdb for parsing.
- Clustering now supports semantic clustering with embeddings, using DBScan.
New features
- Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
- Add a dict source and convert
LangSmith
source to use it by @dsmilkov in #716 - Add clustering signal by @dsmilkov in #711
Performance
- Use iterables for compute_signal and compute_embedding. by @nsthorat in #706
- Write embeddings to the vector store iteratively by @nsthorat in #709
- Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
- Speed up the docker image build step by installing lilac from pip before installing the local wheel. by @nsthorat in #714
- Improve perf of server by removing UUID sort by @dsmilkov in #715
Bug fixes
- Fix semantic search on repeated by @dsmilkov in #704
- Fix syntax error with keyword search by @dsmilkov in #705
- Fix bug with span highlighting a repeated field by @nsthorat in #713
- Change the bootup load to be during the new FastAPI lifecycle API. by @nsthorat in #717
Full Changelog: v0.1.0...v0.1.1