Skip to content

Releases: databricks/lilac

v0.3.9

29 Feb 20:01
Compare
Choose a tag to compare

What's Changed

Other Changes

  • Add skip_noisy_assignment to dataset.cluster by @dsmilkov in #1194
  • Fix a bug with excess RAM usage during vector computes. by @nsthorat in #1195

Full Changelog: v0.3.8...v0.3.9

v0.3.8

28 Feb 14:00
Compare
Choose a tag to compare

What's Changed

Other Changes

  • Fix llama-index test after upgrading deps by @dsmilkov in #1192
  • Respect the self._split param when computing embeddings for a text by @dsmilkov in #1193

Full Changelog: v0.3.7...v0.3.8

v0.3.7

23 Feb 15:46
Compare
Choose a tag to compare

What's Changed

Other Changes

Full Changelog: v0.3.6...v0.3.7

v0.3.6

23 Feb 14:40
Compare
Choose a tag to compare

What's Changed

Other Changes

  • Add format selectors to the compute clusters UI. by @nsthorat in #1185
  • Fix bug when theres no dataset format and we fail to cluster. by @nsthorat in #1189

Full Changelog: v0.3.5...v0.3.6

v0.3.5

14 Feb 18:37
Compare
Choose a tag to compare

This release adds the Nomic 1.5 and bge-m3 embeddings as built-ins.

We also have made it easier to add selections to concepts:

add-to-concept.mp4

Features

UI Changes

Clustering

  • Add support for calling mistral for titling (no public API yet) by @dsmilkov in #1168

Demo

Bug fixes

Other Changes

New Contributors

Full Changelog: v0.3.4...v0.3.5

v0.3.4

02 Feb 19:44
Compare
Choose a tag to compare

This release adds task cancellation to the UI and fixes a set of bugs around exporting, and some UI weirdness.

Features

Bug fixes

  • Fix unusual inputs to auto binning histogram by @brilee in #1151
  • Fix errors when the concept is empty by @dsmilkov in #1158
  • Fix issue where by default a long media doesnt take up the full screen by @nsthorat in #1159
  • Fix some issues with exporting. by @nsthorat in #1160

Public demo

Docs

Other Changes

Full Changelog: v0.3.3...v0.3.4

v0.3.3

29 Jan 19:35
Compare
Choose a tag to compare

This release is mostly bug fixes and one API change for exporting.

For all export methods, we now have an "include_signals" bit. By default, we do not export signals computed in Lilac as extra metadata to preserve your source data.

For example:

hf_ds = ds.to_huggingface(include_signals=True)

What's Changed

Clustering

  • Cache dataset.pivot() and make cluster search box more visible by @dsmilkov in #1126
  • Add some polish to the clusters page. Fix some other UI bugs. by @nsthorat in #1128
  • Add progress bars for JINA embedding for local clustering by @brilee in #1138
  • Speedup rendering of cluster view by @dsmilkov in #1137

Bug fixes

Docs

Demo

Performance

  • Handle sqlite files separately during table/index creation by @brilee in #1140

API

  • Add exclude_signals to select_rows, and include_signals to export methods. by @nsthorat in #1139

Full Changelog: v0.3.2...v0.3.3

v0.3.2

24 Jan 20:59
Compare
Choose a tag to compare

This release is mostly bug fixes.

Bug fixes

Lilac Garden & Clustering

Full Changelog: v0.3.1...v0.3.2

v0.3.1

23 Jan 21:26
Compare
Choose a tag to compare

Bugs

Docs

Full Changelog: v0.3.0...v0.3.1

v0.3.0

23 Jan 16:53
Compare
Choose a tag to compare

This release extends our exporting capabilities and adds support for loading custom embeddings.

Because the shape of exported data has changed, this is a breaking change so we released 0.3.0.

Loading custom embeddings

Loading pre-computed embeddings from an external source is now possible. See our Custom embeddings guide for more details.

# Load the embeddings into Lilac.
def _load_embedding(item):
  return vector_store[item['id']]

# Load the embeddings into Lilac.
ds.load_embedding(
  load_fn=_load_embedding, index_path='text', embedding='my_embedding', overwrite=True
)

Export to HuggingFace

You can now export to a HuggingFace dataset.

# Export a Lilac dataset to a huggingface dataset.
hf_ds = ds.to_huggingface()
# Optionally: use the HuggingFace API to push the dataset to the hub.
hf_ds.push_to_hub('lilacai/glaive-function-calling-v2-sharegpt')

Exporting no longer flattens data

Before this release, exporting would flatten source data. For instance, data that looks like:

{
  'conversations': [{
    'from': 'user',
    'value': 'Hello there'
  }]

Would get exported incorrectly as:

{'conversations.*.from': ['user'], 'conversations.*.value': ['Hello there']}

Now it is exported exactly the way it was shaped when importing.

What's Changed

Features

  • Add support for loading custom embeddings. by @nsthorat in #1090
  • Fix dataset export to avoid flattening the user data by @dsmilkov in #1091
  • Export to HuggingFace. Support glaive-function-calling-v2 in the demo, clusters, and via sharegpt. by @nsthorat in #1113

Performance

  • Speed up PII and lang detection by making them multiprocess by @dsmilkov in #1097

Bug fixes

Lilac Garden

UI

  • Refactor buttons so we have a single cluster button. by @nsthorat in #1111

Full Changelog: v0.2.5...v0.3.0