v0.3.0
This release extends our exporting capabilities and adds support for loading custom embeddings.
Because the shape of exported data has changed, this is a breaking change so we released 0.3.0.
Loading custom embeddings
Loading pre-computed embeddings from an external source is now possible. See our Custom embeddings guide for more details.
# Load the embeddings into Lilac.
def _load_embedding(item):
return vector_store[item['id']]
# Load the embeddings into Lilac.
ds.load_embedding(
load_fn=_load_embedding, index_path='text', embedding='my_embedding', overwrite=True
)
Export to HuggingFace
You can now export to a HuggingFace dataset.
# Export a Lilac dataset to a huggingface dataset.
hf_ds = ds.to_huggingface()
# Optionally: use the HuggingFace API to push the dataset to the hub.
hf_ds.push_to_hub('lilacai/glaive-function-calling-v2-sharegpt')
Exporting no longer flattens data
Before this release, exporting would flatten source data. For instance, data that looks like:
{
'conversations': [{
'from': 'user',
'value': 'Hello there'
}]
Would get exported incorrectly as:
{'conversations.*.from': ['user'], 'conversations.*.value': ['Hello there']}
Now it is exported exactly the way it was shaped when importing.
What's Changed
Features
- Add support for loading custom embeddings. by @nsthorat in #1090
- Fix dataset export to avoid flattening the user data by @dsmilkov in #1091
- Export to HuggingFace. Support glaive-function-calling-v2 in the demo, clusters, and via sharegpt. by @nsthorat in #1113
Performance
Bug fixes
- Bug fixes: overwrite, task errors, embedding keys. by @nsthorat in #1098
- Fixed cache busting behavior by @brilee in #1099
- Small fixes for the demo. by @nsthorat in #1106
- Fix a bug where we drop source fields that have embeddings computed on them by @nsthorat in https://gith* Couple of small bug fixes by @dsmilkov in #1109
ub.com//pull/1093 - Fix edge case where table doesn't exist and doesn't get created by @brilee in #1110
- Fix the cluster sort by membership score bug by @dsmilkov in #1112
Lilac Garden
- Rename remote => use_garden. by @nsthorat in #1092
- Fix chunking bug for remote embedding computation by @dsmilkov in #1096
- Add accelerated PII execution on Lilac Garden by @dsmilkov in #1103
- Move use_garden outside of a Signal. by @nsthorat in #1102
UI
Full Changelog: v0.2.5...v0.3.0