19 Jan 18:34

nsthorat

cf4b354

v0.2.5

What's Changed

This release is mostly UI bug fixes.

We also added support for remote computation of GTE embeddings via Lilac Garden. If you are interested, please reach out to us.

Bug fixes

Always sort by rowid to make db results stable by @dsmilkov in #1086
Fix a couple of UI bugs by @dsmilkov in #1088
Fix some small UI bugs. by @nsthorat in #1084
Ignore folders that don't have a manifest.json and make project config source of truth for dataset listing by @brilee in #1083
Fix bug with keyword search highlighting every field. by @nsthorat in #1081

Garden

Add remote computation for GTE by @dsmilkov in #1082

Other Changes

Remove the media path x preferred-embedding logic. by @nsthorat in #1079
Deploy dataset by @brilee in #1085
Enable deploying at HEAD for demo as well as staging. by @brilee in #1089

Full Changelog: v0.2.4...v0.2.5

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

17 Jan 16:16

nsthorat

v0.2.4

efbf7c3

v0.2.4

What's Changed

This release is mostly bug fixes and small changes to the upcoming clustering UI.

Clustering

Optimize the cluster view page to reduce number of requests by @dsmilkov in #1062
Improve the cluster titling and fix a few client-side bugs by @dsmilkov in #1058
Add share-gpt specific format selectors. by @nsthorat in #1060
Cluster spec deployer by @brilee in #1063
Improve the cluster/pivot UI by @dsmilkov in #1068
Tiny UI fix for "Clusters of" and shave off 1 call to dataset.stats in clustering by @dsmilkov in #1074
Support input selectors from config files. by @nsthorat in #1076

Bug fixes

fix npm deps? by @brilee in #1070
Fix a couple issues with the export menu. by @nsthorat in #1069
Create tasks api to ensure exceptions caught by @brilee in #1071
Override the OpenAPI base url when lilac is not being served from / by @nsthorat in #1073
Make the entire svelte app use relative links by @dsmilkov in #1075

Docs

Add documentation for sharing datasets. by @nsthorat in #1061
Fix broken link in README by @albertvillanova in #1064
Fix broken links in HuggingFaceSpaceWelcome web component by @albertvillanova in #1065

Other Changes

Update lock files from ./scripts/setup.sh by @nsthorat in #1077

New Contributors

@albertvillanova made their first contribution in #1064

Full Changelog: v0.2.3...v0.2.4

Contributors

nsthorat, brilee, and 2 other contributors

Assets 3

12 Jan 16:56

nsthorat

v0.2.3

6b315be

v0.2.3

What's Changed

We now have 2 CLI scripts for sharing Lilac datasets (via huggingface):

lilac upload local/Capybara --url_or_repo=lilacai/Capybara

To download the dataset to a local project directory:

lilac download lilacai/Capybara

For more details on sharing datasets, see the Sharing Guide

With this change, we added a new environment variable USE_TABLE_INDEX, useful for frozen demos. This will dramatically improve the performance of queries as we use a cached DuckDB table. This will slow down labeling, or any edits, as the table will get re-computed upon each change.

Upload / Download

Add an upload dataset script. Some other cleanups. by @nsthorat in #1059

Bug fixes

Fix a bug with CSV source reader for TSV files, and named columns. by @nsthorat in #1040
Progress bar by @brilee in #1043
Fix bug with ItemMedia not rendering media fields that are deeply nested siblings. by @nsthorat in #1044
Fix clustering an enriched field by @dsmilkov in #1048
Propagate filters in the group by panel by @dsmilkov in #1041

Performance

Add indexing on database startup, flag-guarded by @brilee in #1052

UI

Add clustering in the UI by @dsmilkov in #1045
Add search to the cluster UI. Add some polish. by @nsthorat in #1054
Add clusters to the schema menu. Migrate to a custom carousel component so the page doesn't freeze. by @nsthorat in #1050

Clustering

Add dataset.cluster(input) where input can be any lambda func by @dsmilkov in #1042
dataset.cluster() flattens any repeated before clustering by @dsmilkov in #1051

Lilac Garden

Add remote computation for the jina embedding by @dsmilkov in #1049

Other Changes

Move the import of .env.local in publish_pip to the top of the file. by @nsthorat in #1039
fix: migrate embeddings by azure openai to openai > 1.0.0 by @dechantoine in #1053
Streamline lilac deployment by @brilee in #1057
Add a notebook for working with concepts from python. by @nsthorat in #1055

Full Changelog: v0.2.2...v0.2.3

Contributors

nsthorat, brilee, and 2 other contributors

Assets 3

08 Jan 14:27

nsthorat

v0.2.2

86d1b10

v0.2.2

Bug fixes

Fix a bug with OpenAI embeddings after upgrading. by @nsthorat in #1038
Remove an extra temporary column at the end of clustering by @dsmilkov in #1035

Other Changes

Convert the pivot viewer to a bunch of carousels. by @nsthorat in #1034

Full Changelog: v0.2.1...v0.2.2

Contributors

nsthorat and dsmilkov

Assets 3

05 Jan 22:17

nsthorat

v0.2.1

3de4e59

v0.2.1

Keyboard shortcuts are now available for deleting, and labeling!

To delete a row: use backspace or delete.
To label, go to dataset settings, and configure key-bindings for each label.

keyboard_shortcuts.mp4

What's Changed

Features

Add keyboard shortcuts for fast labeling. by @nsthorat in #1028

Bug fixes

Allows non folder exports by @hynky1999 in #1026
Fixes incorrect destructuring by @hynky1999 in #1025
Improve auto-binning, and sorting of histograms. by @nsthorat in #1033
Fix lilac deployer for slashed datasets. by @nsthorat in #1021

Docs

Update documentation for labels, keyboard shortcuts, deleting rows. by @nsthorat in #1030
Add documentation that points to the lilac deployer UI. by @nsthorat in #1020

UI

Improve the UI around deleting. by @nsthorat in #1024
Add a 2-feature pivot view, allowing you to view a hierarchy of 2 features by @nsthorat in #1023

Other Changes

Improve the title generation in clustering by @dsmilkov in #1022
Fix some map(overwrite=True) bugs by @dsmilkov in #1031
Add superclusters (categories) by @dsmilkov in #1032

New Contributors

@hynky1999 made their first contribution in #1026

Full Changelog: v0.2.0...v0.2.1

Contributors

nsthorat, dsmilkov, and hynky1999

Assets 3

03 Jan 22:12

nsthorat

v0.2.0

68e1fe7

v0.2.0

What's Changed

The UI now supports deleting row(s), viewing the trash & undeleting. Exporting will now automatically drop deleted rows.

Breaking changes

Merge output_column and nest_under --> dataset.map(output_path=...) by @dsmilkov in #1001

UI

Add the ability to delete and restore rows from the UI. by @nsthorat in #1011
Fix signal configs to use ClassVar by @dsmilkov in #1016

Performance

Fix jina to also run on CUDA if available by @dsmilkov in #996
Use CUDA when available for sentence transformers. by @nsthorat in #991
Use the yaml CLoader loader if it's available. by @nsthorat in #995
Use cuml for clustering when possible by @dsmilkov in #997
Fix map by @brilee in #994
Add Jina (Small) on Garden signal by @dsmilkov in #1009

Bug fixes

Fix some small UI bugs. by @nsthorat in #987
Fix issue with repeated of string rendering. by @nsthorat in #1015
Load datasets in a separate thread from the UI. by @nsthorat in #1014
Fix issue where we don't block on the server thread from the CLI. by @nsthorat in #1013

Clustering (coming soon)

Make ds.cluster() have resumable title generation by @dsmilkov in #1000
dataset.cluster() now uses transform() which uses map() by @dsmilkov in #1002
Add topic clustering in dataset.cluster() by @dsmilkov in #993
Allow clustering of a nested path by @dsmilkov in #1007
Add dataset.cluster(remote=True) bit by @dsmilkov in #1010

Map & signal changes

Add signal.map customization by @brilee in #1004
Allow map to be called for arbitrary depth by @dsmilkov in #998
remove VectorCompute path in dispatch_workers by @brilee in #1008
Implement signals on top of the map infrastructure by @brilee in #1006
dataset.map can now nest_under any repeated by @dsmilkov in #999
Remove TaskShardId by @brilee in #1003

Other Changes

Update the Dockerfile to use port 80 so we can use it on GCE. by @nsthorat in #992
Make OpenAI calls threaded with exponential backoff by @dsmilkov in #1005

Full Changelog: v0.1.26...v0.2.0

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

19 Dec 03:32

nsthorat

v0.1.26

8ed7b0d

v0.1.26

This release adds a markdown code block extractor signal, highlighting markdown code blocks and their languages.

What's Changed

Add markdown extractor signal. by @nsthorat in #983
Add dataset.cluster() by @dsmilkov in #981

Bug fixes

Emit membership prob in HDBScan, and fix "group by" UI bugs by @dsmilkov in #976
Fix ll.start_server() and add a test for full end-to-end server startup by @dsmilkov in #984
Add CLI integration tests. by @nsthorat in #985
Make ll.start_server() blocking outside an event loop by @dsmilkov in #986

Other Changes

Full Changelog: v0.1.25...v0.1.26

Contributors

nsthorat and dsmilkov

Assets 3

18 Dec 17:55

nsthorat

v0.1.25

d799e35

v0.1.25

This release drops dask for a thin multi-processing client, and comes with lots of performance improvements, namely the slow import time of lilac.

We have also added a simple API for loading from HuggingFace

import lilac as ll
from datasets import load_dataset
hf_ds = load_dataset('Open-Orca/SlimOrca-Dedup')
ds = ll.from_huggingface(hf_ds)

And a simple API for getting embeddings:

answer_emb = ds.get_embeddings('jina-v2-small', rowid, 'answer')[0]['vector']

We've also added some color to the UI, and organized components a little better

Features

Add Jina V2 embeddings by @dsmilkov in #966
Add sugar for ll.from_huggingface() by @dsmilkov in #962
Improve the row header to give us space for deleting. by @nsthorat in #965

Performance

Reduce import times by @brilee in #961
Using loky (thin wrapper around multiprocessing) instead of dask by @dsmilkov in #947
fix iterable robustness by @brilee in #977

Bug fixes

Fix memory leak caused by Iterable/Iterator mixups by @brilee in #974
Fix broken doc links. by @nsthorat in #964
Add color scales for semantic / concept search. Add openchat format. by @nsthorat in #975

Other Changes

Remove legal-termination concept. by @nsthorat in #980

Full Changelog: v0.1.24...v0.1.25

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

12 Dec 19:29

nsthorat

v0.1.24

08353dd

v0.1.24

This release changes the text media visualizer to Monaco (the engine that powers VSCode).

Monaco allows us to:

Deep-link to any line within a document.
Add right click menus to text.
Add "thumbs up" and "thumbs down" to concepts from the menu, for any text.
Search any text from the right click-menu, with semantic similarity or keyword search.

Here is a video explaining the changes: https://www.youtube.com/watch?v=83Rj006tVIk

This release also has custom support for the ShareGPT format in the UI:

Features

Add special support for a DELETED label by @brilee in #951
Switch to monaco for the main viewer. by @nsthorat in #952
Simplify monaco viewer. Add support for deep linking. by @nsthorat in #956
Infer dataset formats. Start with just ShareGPT. by @nsthorat in #948
Add UI for title slots for ShareGPT. by @nsthorat in #950

Bug fixes

Make the signal "try it" page work for signals w/o schema by @dsmilkov in #944
Fix UI bugs: monaco scroll, hash state forgotten, compare non-media fields by @nsthorat in #946
Eliminate setup count call from parquet_source. by @brilee in #959
Fix a bug where we highlighted all concept spans regardless of their score. by @nsthorat in #958
Fix bug with loading dataset and settings. by @nsthorat in #957

Docs

Add limit/filter docs by @brilee in #943

Other Changes

Add youtube video for the blog post by @dsmilkov in #942
Drive-by cleanup of schema.py code by @brilee in #955

Full Changelog: v0.1.23...v0.1.24

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

07 Dec 16:25

nsthorat

v0.1.23

95d9081

v0.1.23

High-level

Lilac is now moving towards editing data directly in the tool. The first vehicle for this is Dataset.map.

New blog post on curating data with the new Dataset.map feature:
https://docs.lilacml.com/blog/curate-coding-dataset.html

Documentation on Dataset.map:
https://docs.lilacml.com/datasets/dataset_edit.html

Features

Add dataset.map support for limit/filter by @brilee in #933
Add support for arbitrary value type v in map<k, v> in parquet by @dsmilkov in #935
Add batch size support and collapse transform impl by @brilee in #934

Improvements

Improve the UI for repeated values. by @nsthorat in #904
Small ergonomic fixes while writing the "code formatting" blog post by @dsmilkov in #909
Merge multiple shards of the same task into the same progress bar. by @nsthorat in #910
Add threaded task execution. by @nsthorat in #920
Fix css style for markdown tables by @dsmilkov in #931
Fix tqdm progress bars by separating report_progress from show_progress. by @nsthorat in #929
Make parquet the default source by @dsmilkov in #941

Bug fixes

Fix keyword search to work with apostrophe ' by @dsmilkov in #907
Make sure the results of dataset.map() always returns an iterable. by @nsthorat in #925
Remove position= in tqdm. by @nsthorat in #913

Docs

Add a guide for iterating on dataset by @dsmilkov in #923
Add blog post for diffing and dataset.map by @dsmilkov in #912
Redo the docs.lilacml.com landing page by @dsmilkov in #932
Small tweaks to improve the glaive dataset blog post. by @nsthorat in #938
Rename the guide to edit a dataset by @dsmilkov in #930
Revamp welcome/intro pages by @brilee in #908

Other

Refactor dataset/signal endpoints into separate module by @brilee in #900
Add memray dep and instructions by @brilee in #917
Add spec for select options by @brilee in #918
Simplify helper methods to closer align to API for select options by @brilee in #919
Start writing the query options compiler by @brilee in #924

Coming soon

Add server-side RAG python code. by @nsthorat in #911
Migrate the UI to the server-side python RAG. by @nsthorat in #914
Improve the RAG UI by @nsthorat in #916

Full Changelog: v0.1.22...v0.1.23

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

Releases: databricks/lilac

v0.2.5

What's Changed

Bug fixes

Garden

Other Changes

Contributors

v0.2.4

What's Changed

Clustering

Bug fixes

Docs

Other Changes

New Contributors

Contributors

v0.2.3

What's Changed

Upload / Download

Bug fixes

Performance

UI

Clustering

Lilac Garden

Other Changes

Contributors

v0.2.2

Bug fixes

Other Changes

Contributors

v0.2.1

What's Changed

Features

Bug fixes

Docs

UI

Other Changes

New Contributors

Contributors

v0.2.0

What's Changed

Breaking changes

UI

Performance

Bug fixes

Clustering (coming soon)

Map & signal changes

Other Changes

Contributors

v0.1.26

What's Changed

Bug fixes

Other Changes

Contributors

v0.1.25

Features

Performance

Bug fixes

Other Changes

Contributors

v0.1.24

Features

Bug fixes

Docs

Other Changes

Contributors

v0.1.23

High-level

Features

Improvements

Bug fixes

Docs

Other

Coming soon

Contributors