Releases: databricks/lilac
v0.3.9
v0.3.8
v0.3.7
v0.3.6
v0.3.5
This release adds the Nomic 1.5 and bge-m3 embeddings as built-ins.
We also have made it easier to add selections to concepts:
add-to-concept.mp4
Features
- Support the FastAPI app being mounted. by @nsthorat in #1174
- Add bge-m3 and Nomic 1.5 embeddings. by @nsthorat in #1182
- Change link to selection => copy link to selection. by @nsthorat in #1175
UI Changes
Clustering
Demo
- Add GAIR-lima to the HF demo. by @nsthorat in #1171
- Add ultrachat embeddings to public demo. by @nsthorat in #1173
Bug fixes
- Format N/A values separately in histograms by @brilee in #1169
- Fix bug with editing filters & keyword search of '' by @nsthorat in #1180
- Fix issue with FastAPI mounts. by @nsthorat in #1183
- Fix a few UI issues with concepts & UI bugs by @nsthorat in #1179
Other Changes
- Variables for OpenAI API and Model by @drikster80 in #1172
New Contributors
- @drikster80 made their first contribution in #1172
Full Changelog: v0.3.4...v0.3.5
v0.3.4
This release adds task cancellation to the UI and fixes a set of bugs around exporting, and some UI weirdness.
Features
- Implement signal cancellation by @brilee in #1154
- Create Cancelled task status by @brilee in #1163
- Make the cancel button in the UI work. by @nsthorat in #1162
Bug fixes
- Fix unusual inputs to auto binning histogram by @brilee in #1151
- Fix errors when the concept is empty by @dsmilkov in #1158
- Fix issue where by default a long media doesnt take up the full screen by @nsthorat in #1159
- Fix some issues with exporting. by @nsthorat in #1160
Public demo
Docs
- Add the garden blog post by @dsmilkov in #1144
- Update Quickstart (UI and python) by @brilee in #1145
- Add links to Lilac Garden page by @brilee in #1132
- Update docs - address PR comments from #1145 by @brilee in #1147
- Tiny blog update (remove yet another logo) by @dsmilkov in #1149
- Fix garden links by @dsmilkov in #1157
Other Changes
Full Changelog: v0.3.3...v0.3.4
v0.3.3
This release is mostly bug fixes and one API change for exporting.
For all export methods, we now have an "include_signals" bit. By default, we do not export signals computed in Lilac as extra metadata to preserve your source data.
For example:
hf_ds = ds.to_huggingface(include_signals=True)
What's Changed
Clustering
- Cache dataset.pivot() and make cluster search box more visible by @dsmilkov in #1126
- Add some polish to the clusters page. Fix some other UI bugs. by @nsthorat in #1128
- Add progress bars for JINA embedding for local clustering by @brilee in #1138
- Speedup rendering of cluster view by @dsmilkov in #1137
Bug fixes
- Fix a few small bugs by testing prod mode by @dsmilkov in #1129
- Add --deploy_at_head to deploy_project. Fix bug with percentages. by @nsthorat in #1131
- Rename USE_TABLE_INDEX => LILAC_USE_TABLE_INDEX. Add LILAC_PROD_MODE. by @nsthorat in #1134
- Fix signal info error by @dsmilkov in #1136
- Improve clusters and several bug fixes by @dsmilkov in #1141
- Fix issue with urls that end with a slash. by @nsthorat in #1143
Docs
Demo
Performance
API
Full Changelog: v0.3.2...v0.3.3
v0.3.2
This release is mostly bug fixes.
Bug fixes
- Fix pandas deprecation warning by @brilee in #1123
- Fix "open dataset and apply concept" by @dsmilkov in #1124
- Fix concept labeler when we index a repeated string (capybara) by @dsmilkov in #1122
- Fix a few bugs related to concepts and clustering by @dsmilkov in #1121
Lilac Garden & Clustering
- Add mosaic-instruct-v3 by @brilee in #1116
- Add eval datasets to the huggingface demo. by @nsthorat in #1119
- Add more demo datasets. by @nsthorat in #1120
Full Changelog: v0.3.1...v0.3.2
v0.3.1
v0.3.0
This release extends our exporting capabilities and adds support for loading custom embeddings.
Because the shape of exported data has changed, this is a breaking change so we released 0.3.0.
Loading custom embeddings
Loading pre-computed embeddings from an external source is now possible. See our Custom embeddings guide for more details.
# Load the embeddings into Lilac.
def _load_embedding(item):
return vector_store[item['id']]
# Load the embeddings into Lilac.
ds.load_embedding(
load_fn=_load_embedding, index_path='text', embedding='my_embedding', overwrite=True
)
Export to HuggingFace
You can now export to a HuggingFace dataset.
# Export a Lilac dataset to a huggingface dataset.
hf_ds = ds.to_huggingface()
# Optionally: use the HuggingFace API to push the dataset to the hub.
hf_ds.push_to_hub('lilacai/glaive-function-calling-v2-sharegpt')
Exporting no longer flattens data
Before this release, exporting would flatten source data. For instance, data that looks like:
{
'conversations': [{
'from': 'user',
'value': 'Hello there'
}]
Would get exported incorrectly as:
{'conversations.*.from': ['user'], 'conversations.*.value': ['Hello there']}
Now it is exported exactly the way it was shaped when importing.
What's Changed
Features
- Add support for loading custom embeddings. by @nsthorat in #1090
- Fix dataset export to avoid flattening the user data by @dsmilkov in #1091
- Export to HuggingFace. Support glaive-function-calling-v2 in the demo, clusters, and via sharegpt. by @nsthorat in #1113
Performance
Bug fixes
- Bug fixes: overwrite, task errors, embedding keys. by @nsthorat in #1098
- Fixed cache busting behavior by @brilee in #1099
- Small fixes for the demo. by @nsthorat in #1106
- Fix a bug where we drop source fields that have embeddings computed on them by @nsthorat in https://gith* Couple of small bug fixes by @dsmilkov in #1109
ub.com//pull/1093 - Fix edge case where table doesn't exist and doesn't get created by @brilee in #1110
- Fix the cluster sort by membership score bug by @dsmilkov in #1112
Lilac Garden
- Rename remote => use_garden. by @nsthorat in #1092
- Fix chunking bug for remote embedding computation by @dsmilkov in #1096
- Add accelerated PII execution on Lilac Garden by @dsmilkov in #1103
- Move use_garden outside of a Signal. by @nsthorat in #1102
UI
Full Changelog: v0.2.5...v0.3.0