Skip to content

v0.3.0

Compare
Choose a tag to compare
@nsthorat nsthorat released this 23 Jan 16:53
· 70 commits to main since this release

This release extends our exporting capabilities and adds support for loading custom embeddings.

Because the shape of exported data has changed, this is a breaking change so we released 0.3.0.

Loading custom embeddings

Loading pre-computed embeddings from an external source is now possible. See our Custom embeddings guide for more details.

# Load the embeddings into Lilac.
def _load_embedding(item):
  return vector_store[item['id']]

# Load the embeddings into Lilac.
ds.load_embedding(
  load_fn=_load_embedding, index_path='text', embedding='my_embedding', overwrite=True
)

Export to HuggingFace

You can now export to a HuggingFace dataset.

# Export a Lilac dataset to a huggingface dataset.
hf_ds = ds.to_huggingface()
# Optionally: use the HuggingFace API to push the dataset to the hub.
hf_ds.push_to_hub('lilacai/glaive-function-calling-v2-sharegpt')

Exporting no longer flattens data

Before this release, exporting would flatten source data. For instance, data that looks like:

{
  'conversations': [{
    'from': 'user',
    'value': 'Hello there'
  }]

Would get exported incorrectly as:

{'conversations.*.from': ['user'], 'conversations.*.value': ['Hello there']}

Now it is exported exactly the way it was shaped when importing.

What's Changed

Features

  • Add support for loading custom embeddings. by @nsthorat in #1090
  • Fix dataset export to avoid flattening the user data by @dsmilkov in #1091
  • Export to HuggingFace. Support glaive-function-calling-v2 in the demo, clusters, and via sharegpt. by @nsthorat in #1113

Performance

  • Speed up PII and lang detection by making them multiprocess by @dsmilkov in #1097

Bug fixes

Lilac Garden

UI

  • Refactor buttons so we have a single cluster button. by @nsthorat in #1111

Full Changelog: v0.2.5...v0.3.0