Skip to content

Releases: databricks/lilac

v0.1.1

26 Sep 00:08
Compare
Choose a tag to compare

Overview

  • Embedding computation can now be larger-than-RAM! Computing lots of embeddings will iteratively write to a vector store.
  • JSON and CSV sources are heavily optimized and go through duckdb for parsing.
  • Clustering now supports semantic clustering with embeddings, using DBScan.

New features

  • Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
  • Add a dict source and convert LangSmith source to use it by @dsmilkov in #716
  • Add clustering signal by @dsmilkov in #711

Performance

  • Use iterables for compute_signal and compute_embedding. by @nsthorat in #706
  • Write embeddings to the vector store iteratively by @nsthorat in #709
  • Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in #710
  • Speed up the docker image build step by installing lilac from pip before installing the local wheel. by @nsthorat in #714
  • Improve perf of server by removing UUID sort by @dsmilkov in #715

Bug fixes

  • Fix semantic search on repeated by @dsmilkov in #704
  • Fix syntax error with keyword search by @dsmilkov in #705
  • Fix bug with span highlighting a repeated field by @nsthorat in #713
  • Change the bootup load to be during the new FastAPI lifecycle API. by @nsthorat in #717

Full Changelog: v0.1.0...v0.1.1

v0.1.0

21 Sep 12:23
Compare
Choose a tag to compare

New Features

Lilac now supports labeling! For a detailed guide, see Labeling a dataset

Labels can be added for individual rows:

dataset.add_labels(
  'good',
  row_ids=['0003076800f1471f8f4c8a1b2deda742'])

Or for slices of the data:

dataset.add_labels(
  'short',
  filters=[
    (('text', 'text_statistics', 'num_characters'), 'less', 1000)
  ]
)

They can then be exported:

short_rows = list(
  dataset.select_rows(
    ['*', 'short'],
    filters=[
      (('short', 'label'), 'exists')
    ]
  )
)
# Print the first row.
print(short_rows[0])

Output:

{
  '__rowid__': '0003076800f1471f8f4c8a1b2deda742',
  'text': 'If you want to truly experience the magic (?) of Don Dohler, then check out "Alien Factor" or maybe "Fiend", but not this. Alien Factor is actually rather imaginative considering the low budget and it\'s fairly creepy, but "Nightbeast", which I guess is sort of an updating of Alien Factor, is just plain dumb. Actors sleepwalk through their roles, especially Mr. Monotone sheriff, and the monster is some dumb Halloween-mask kind of thing instead of the wildly imaginative (but kind of stupid) looking critters from Alien Factor. A spaceship crashes on Earth and there\'s a critter inside, of course, who runs around vaporizing people. And ripping off arms, etc. And he has a cool ray gun that he uses to vaporize people too, until it gets shot out of his hand. And that\'s really about it. "Alien Factor" beats this mess hands down, if you really want to see a good Don Dohler movie, check that out instead. And RIP Don Dohler, 12/2/06.',
  'label': 'neg',
  '__hfsplit__': 'test',
  'good': {
    'label': 'true',
    'created': datetime.datetime(2023, 9, 20, 10, 16, 15, 545277)
  }
}

Labels can also be added via the UI:

image

What's changed

Bug fixes

  • Allow add_labels and remove_labels without selection by @dsmilkov in #698
  • Fix UI regression and empty lilac.yml (no datasets) by @dsmilkov in #700

Full Changelog: v0.0.20...v0.1.0

v0.0.20

20 Sep 20:05
Compare
Choose a tag to compare

Features

  • Add "More like this" button in the item viewer by @dsmilkov in #676
  • Add simple labeling functionality in the item viewer by @dsmilkov in #679
  • Add removing labels, and add row_ids to add labels. by @nsthorat in #680
  • Improving the label download by @dsmilkov in #682
  • Expose LangSmithSource to the public API and docs by @dsmilkov in #684
  • Add UI to clear labels. by @nsthorat in #686
  • Add a 'label all' button to label all results in view by @nsthorat in #687
  • Add docs for labeling. Fix some labeling issues. by @nsthorat in #692

Bug fixes

  • Tiny CSS fixes to make mobile not terrible by @nsthorat in #677
  • Fix REST API with new labels API. by @nsthorat in #681
  • Fix issue with overflow on text by @nsthorat in #683
  • Fix upload scripts so we can push to a staging directory without uploading data. by @nsthorat in #689
  • Add better error messaging when inferring schema by @dsmilkov in #691
  • Fix the huggingface deploy script. by @nsthorat in #695
  • Fix bug with UDFs after metadata separation by @nsthorat in #696

Other

Full Changelog: v0.0.19...v0.0.20

v0.0.19

14 Sep 19:18
Compare
Choose a tag to compare

What's Changed

New Features 🎉

Other Changes

New Contributors

Full Changelog: v0.0.18...v0.0.19

v0.0.18

06 Sep 14:19
Compare
Choose a tag to compare

New Features

Other Changes

  • Fix the huggingface deploy script. by @nsthorat in #638
  • Fix bug with concept labeler not returning refreshed results. by @nsthorat in #639
  • Improve documentation around GCS paths. by @nsthorat in #647
  • When merging floats, check for closeness to avoid precision issues. Pin pandas version. by @nsthorat in #655
  • Fix RuntimeError in HNSW index by @dsmilkov in #656
  • Fix negative-sentiment and legal-terminal concepts due to missing top-level version field by @dsmilkov in #658
  • Fix italics for N/A by @nsthorat in #662

Full Changelog: v0.0.17...v0.0.18

v0.0.17

02 Sep 12:48
Compare
Choose a tag to compare

What's Changed

  • Fix bug in load script where we try to use the task manager when none is passed. by @nsthorat in #627
  • Various bug fixes by @dsmilkov in #629
  • Fix the async bug when starting the server by @dsmilkov in #636
  • Fix bug with non-serializable schema in the concept labeler. by @nsthorat in #632
  • Update the global project config during changes. by @nsthorat in #631
  • Remove the explicit cache directory for sentence transformers. by @nsthorat in #637

Full Changelog: v0.0.16...v0.0.17

v0.0.16

31 Aug 16:30
Compare
Choose a tag to compare

New Features

Other Changes

  • Improve memory usage of lilac load to unblock mosaic datasets by @dsmilkov in #620
  • Add a project_path to lilac_start. by @nsthorat in #621
  • Allow tanstack query result to contain non-serializable data by @dsmilkov in #625
  • Fix auth bugs with concepts. Pip install lilac[all] in the dockerfile. by @nsthorat in #622
  • Add ability to make concepts public. by @nsthorat in #624

Full Changelog: v0.0.15...v0.0.16

v0.0.15

29 Aug 18:26
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.0.14...v0.0.15

v0.0.14

29 Aug 13:54
cfeb0e8
Compare
Choose a tag to compare

What's Changed

A bug with JavaScript not getting built for the pip package was fixed and released with this version. This includes the change to the searchbox: #603

New Contributors

Full Changelog: v0.0.13...v0.0.14

v0.0.13

29 Aug 00:47
59f5f44
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.0.12...v0.0.13