Releases · databricks/lilac

29 Nov 21:22

nsthorat

v0.1.22

524c840

v0.1.22

High-level

Excluding a tag from the UI is now an option from the searchbox, enabling the workflow of keeping that filter on, and progressively tagging new data to be removed.

- Signals can now be written without defining a schema.

Features

Add dataset.transform() where we pass the entire input as iterable by @dsmilkov in #897
Add support for input paths to dataset.map. by @nsthorat in #882
Improve ergonomics of map, relaxing the exact requirement of kwargs={row, job_id} by @nsthorat in #883
Add a second option in searchbox dropdown to exclude a tag by @brilee in #889
Add rendering of string spans that were derived from a map with input path by @dsmilkov in #888
Make schema in signals optional by @dsmilkov in #895
Add string filters by @brilee in #892

Bug fixes

Fix a few issues with batching, prefetching, and searches. by @nsthorat in #881
Upgrade duckdb to 0.9.2, fixing a crash in a dask process with fetch_df_chunk. by @nsthorat in #884
Fix UI bugs with span rendering of maps. by @nsthorat in #894
Fix span resolving for map outputs by @dsmilkov in #886
Prefer existing embedding in embedding retrieval function by @brilee in #890
Allow lilac to run tasks outside a running event loop. by @nsthorat in #899

Other Changes

Pass explicit schema during jsonl -> parquet conversion by @dsmilkov in #885
Rename lilac.lilac_span to lilac.span by @dsmilkov in #887
Make the tags & namespaces in the dataset panel expandable. by @nsthorat in #893
Fix trailing error with tests. by @nsthorat in #901
Make the tag expandables serializable in the URL for sharing. by @nsthorat in #898
Add the navigation store to the URL hash. by @nsthorat in #896

Full Changelog: v0.1.21...v0.1.22

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

23 Nov 03:04

nsthorat

v0.1.21

b8628cf

v0.1.21

Features

Signal computations are now cached. If a signal fails half-way through, it will be resumed.
Source loading is much faster, up to 40x faster for some sources (e.g. HuggingFace)
Map dtype is now supported for parquet sources.

Details

Add jsonl intermediate caching to signals. Introduce a central spot for this cache abstraction. by @nsthorat in #858
Rename fast_process to load_to_parquet by @brilee in #862
Implement fast_process for parquet sources by @brilee in #860
Implement CSV direct to parquet by @brilee in #863
Implement fast json source by @brilee in #865
Add map<key, value> dtype. No support in the UI yet. by @dsmilkov in #870
Implement fast processing for huggingface datasets by @brilee in #869

Bug Fixes & Other Changes

add development docs on profiling by @brilee in #861
Add docs for settings and compare mode by @dsmilkov in #859
Add a nest_under field to dataset.map(). by @nsthorat in #866
Avoid computing stats for every single field on page load by @dsmilkov in #873
Fix a sample_size yaml bug by @dsmilkov in #874
UI fixes for expanding long rows. by @nsthorat in #875
Fix small bug with compute signal / concepts and filtering by valid dtypes. by @nsthorat in #877
Add support for map field in the schema and UI by @dsmilkov in #878
Fix a bug with previewing and comparing on repeated values. by @nsthorat in #879
Allow custom signals to work with dask processes. by @nsthorat in #880

Full Changelog: v0.1.20...v0.1.21

Contributors

nsthorat, brilee, and dsmilkov

Assets 3

16 Nov 13:13

nsthorat

v0.1.20

3f9572c

v0.1.20

Bug fixes

Small fix with rendering MetadataSearch in the schema view by @dsmilkov in #855
Fast dataset load by @brilee in #854
Fix a bug with single item mode and monaco diff not updating by @dsmilkov in #856

Full Changelog: v0.1.19...v0.1.20

Contributors

brilee and dsmilkov

Assets 3

15 Nov 16:06

nsthorat

v0.1.19

caa8bec

v0.1.19

Bug fixes

Fix thread bug in hnswlib, which should fix CI python tests by @dsmilkov in #852
Fix bugs with the media fields selector where no fields showed up. by @nsthorat in #853

Full Changelog: v0.1.18...v0.1.19

Contributors

nsthorat and dsmilkov

Assets 3

14 Nov 19:46

nsthorat

v0.1.18

1bfd5c5

v0.1.18

What's Changed

Add Single Item as a view type with pagination by @dsmilkov in #846
Add monaco and enable column-level diffing. by @nsthorat in #845
Add parallelism to dataset.map with dask. by @nsthorat in #847
Upgrade Cohere embeddings to v3-light by @brilee in #833
Integrate Presidio into PII detection by @brilee in #839
Simplify the UI for choosing media fields by @nsthorat in #844

Other Changes

Fix the build_docs.sh and watch_docs.sh scripts to use the latest version of Lilac by @dsmilkov in #829
Add backend support for sampling jsonl files by @brilee in #826
Fix the HF deploy script for windows. by @nsthorat in #831
Fix the flaky hdbscan test by setting a UMAP random_state from the unit test. by @nsthorat in #832
Remove redundant dataset_cache call by @brilee in #835
Invalidate the query after the redirect to avoid 500 errors from deleted dataset. by @nsthorat in #836
Fix dataset uploading on windows. by @nsthorat in #837
OpenAI Azure connector by @dechantoine in #838
Expose hbdscan in the docs by @dsmilkov in #840
Add a query type to SemanticSimilaritySignal and SemanticSearch: 'question' | 'document' by @nsthorat in #841
Fix missing token in hf upload by @brilee in #842
Add debouncing to file watcher recompilation by @brilee in #843
Fix bug where missing keys in the filter constraint would raise KeyError by @brilee in #849
Pass the job_id to the dataset.map map_fn. by @nsthorat in #848
Add unit tests for num_jobs=-1 by @nsthorat in #850

Full Changelog: v0.1.17...v0.1.18

Contributors

nsthorat, brilee, and 2 other contributors

Assets 3

07 Nov 15:21

nsthorat

v0.1.17

c89ad89

v0.1.17

What's Changed

Other Changes

Simplify the lilac_deployer, add some links to make it easier. by @nsthorat in #817
Add UI for dataset settings to edit tags of a dataset. by @nsthorat in #824
Parquet source: When pseudo_shuffle=True, limit the number of shards we read from by @dsmilkov in #827

Full Changelog: v0.1.16...v0.1.17

Contributors

nsthorat and dsmilkov

Assets 3

03 Nov 13:07

nsthorat

v0.1.16

39ce4ce

v0.1.16

What's Changed

Other Changes

Update lilac version in deployer UI. Add tokens to HF API calls. by @nsthorat in #813
Update deployer lilac version to 0.1.15. by @nsthorat in #814
Pass token to deploy_project_operations. by @nsthorat in #816

Full Changelog: v0.1.14...v0.1.16

Contributors

nsthorat

Assets 3

02 Nov 13:37

nsthorat

v0.1.14

297642b

v0.1.14

Features

Add "not exist" filter when somebody clicks on "N/A" in the histogram by @dsmilkov in #809
Add a Lilac deployer UI that lets you deploy a dataset + Lilac from a streamlit UI. by @nsthorat in #812

Other Changes

Switch to ruff formatting. by @nsthorat in #810
Switch to Python 3.11. by @nsthorat in #811

Full Changelog: v0.1.13...v0.1.14

Contributors

nsthorat and dsmilkov

Assets 3

31 Oct 20:49

nsthorat

v0.1.13

87ba143

v0.1.13

What's Changed

PaLM gcp connector by @dechantoine in #793
Add JSONL caching for dataset.map(). by @nsthorat in #808

Bug fixes

Fix topk on an indexed repeated field + metadata filter by @dsmilkov in #807

New Contributors

@dechantoine made their first contribution in #793

Full Changelog: v0.1.12...v0.1.13

Contributors

nsthorat, dsmilkov, and dechantoine

Assets 3

27 Oct 20:59

nsthorat

v0.1.12

f99a5d1

v0.1.12

What's Changed

Other Changes

Sleep for 2 seconds after publishing tags in the publish pip script. by @nsthorat in #799
Add the video to readme and website. by @nsthorat in #800
Fix the deploy_website script. by @nsthorat in #803
Hardcode sentence_transformer batch size to 1024 for optimal length-sorting/padding. by @brilee in #804
Fix NaT bug by @brilee in #806

Full Changelog: v0.1.11...v0.1.12

Contributors

nsthorat and brilee

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-level

Features

Bug fixes

Other Changes

Contributors

Features

Details

Bug Fixes & Other Changes

Contributors

Bug fixes

Contributors

Bug fixes

Contributors

What's Changed

Other Changes

Contributors

What's Changed

Other Changes

Contributors

What's Changed

Other Changes

Contributors

Features

Other Changes

Contributors

What's Changed

Bug fixes

New Contributors

Contributors

What's Changed

Other Changes

Contributors

Releases: databricks/lilac

v0.1.22

High-level

Features

Bug fixes

Other Changes

Contributors

v0.1.21

Features

Details

Bug Fixes & Other Changes

Contributors

v0.1.20

Bug fixes

Contributors

v0.1.19

Bug fixes

Contributors

v0.1.18

What's Changed

Other Changes

Contributors

v0.1.17

What's Changed

Other Changes

Contributors

v0.1.16

What's Changed

Other Changes

Contributors

v0.1.14

Features

Other Changes

Contributors

v0.1.13

What's Changed

Bug fixes

New Contributors

Contributors

v0.1.12

What's Changed

Other Changes

Contributors