Releases: truera/trulens
TruLens 1.3.0
Optimizing Feedback Functions
In this release, we add important changes for improving the alignment of their LLM-Judge evals to human evaluations.
Global Improvement of Groundedness Feedback
The first is the global improvement of the groundedness feedback function (benchmarks and methods forthcoming). We invite any users to submit feedback (positive or negative) on the effectiveness of the new groundedness function using GitHub Issues or Discussions.
You can view the addition of new groundedness criteria in the GitHub diff below.
New levers for aligning feedback functions
The second change is that we add new easy-to-use levers for you to change the behavior of feedback functions using few-shot examples and custom criteria. Early customers have seen useful benefit in aligning their feedback functions to their collected expert evaluations using these levers.
Adding custom criteria to a feedback function
custom_criteria = """
A positive sentiment should be expressed with an extremely encouraging and enthusiastic tone.
"""
provider.sentiment(
"When you're ready to start your business, you'll be amazed at how much you can achieve!",
criteria=custom_criteria,
)
Adding few-shot examples to guide feedback functions
from trulens.feedback.v2 import feedback
fewshot_relevance_examples_list = [
(
{
"query": "What are the key considerations when starting a small business?",
"response": "You should focus on building relationships with mentors and industry leaders. Networking can provide insights, open doors to opportunities, and help you avoid common pitfalls.",
},
3,
),
]
provider.relevance(
"What are the key considerations when starting a small business?",
"Find a mentor who can guide you through the early stages and help you navigate common challenges.",
examples=fewshot_relevance_examples_list,
)
What's Changed
- Feedback customization (including few-shot examples) by @sfc-gh-jreini in #1674
- Custom criteria for feedback by @sfc-gh-jreini in #1705
- Update groundedness criteria (with more optimized prompt) by @sfc-gh-dhuang in #1710
- Allow existing tables to be used in ground truth datasets by @sfc-gh-dhuang in #1698
Bug Fixes
- Allow passthrough of feedback parameters including temperature, groundedness configs in the
Feedback
class by @sfc-gh-jreini in #1674 - Remove / retire sql instrumentation in Cortex Endpoint by @sfc-gh-dhuang in #1715
- Poetry < 2.0.0 by @sfc-gh-jreini in #1709
- Update docs to use postgres + psycopg in order to avoid known issues with psycopg2 by @sfc-gh-gtokernliang in #1701
- Update prpr example notebook to reflect latest Cortex provider API by @sfc-gh-dhuang in #1712
Preparations for Open Telemetry compatibility
- Introduce Event table for ORM to prepare for OTEL traces by @sfc-gh-gtokernliang in #1692
- Prototype OTEL exporter by @sfc-gh-gtokernliang in #1694
- Prototype @Instrument with OTEL by @sfc-gh-gtokernliang in #1693
- Move
main_input
,main_output
, and_extract_content
out of app.py by @sfc-gh-gtokernliang in #1706 - Move span-related validation + setting logic out of instrument.py by @sfc-gh-gtokernliang in #1707
Full Changelog: trulens-1.2.11...trulens-1.3.0
TruLens 1.2.11
What's Changed
- Add snowflake PrPr notebook by @sfc-gh-dhuang in #1683
- Support types for Python
< 3.9
. by @sfc-gh-dkurokawa in #1675 - Change issue assignee to Prudhvi to triage by @sfc-gh-jreini in #1686
- TREC DL and LLM AggreFact experiments for relevance benchmark + prompts comparisons and groundedness vs Bespoke Minicheck 7B by @sfc-gh-dhuang in #1660
- Agents: Example of observability for CrewAI by @sfc-gh-jreini in #1621
- Set langchain version for crewai example by @sfc-gh-chu in #1695
Full Changelog: trulens-1.2.10...trulens-1.2.11
TruLens 1.2.10
What's Changed
- decode app and record json in get_df_and_cols by @sfc-gh-chu in #1672
- reset_database only drops trulens tables by @sfc-gh-chu in #1676
- trulens-dashboard: handle selected_rows is None and fix fallback to st.dataframe in SiS by @sfc-gh-chu in #1677
- Fix Cortex complete SDK response parsing. by @sfc-gh-dkurokawa in #1679
- Have
Pace
create an event loop if it doesn't exist. by @sfc-gh-dkurokawa in #1680 - Rename
trulens-semconv
totrulens-otel-semconv
. by @sfc-gh-dkurokawa in #1681 - trulens-semvar conda build files by @sfc-gh-chu in #1678
Full Changelog: trulens-1.2.9...trulens-1.2.10
TruLens 1.2.9
What's Changed
- adding jenkins file for e2e tests by @sfc-gh-srudenko in #1661
- Memoize base endpoint creation for cost tracking by @sfc-gh-chu in #1659
- add importlib resources to conda build by @sfc-gh-chu in #1662
- Switch from SQL function to REST API backend for Cortex Complete - cost tracking of both feedback computations and app generation by @sfc-gh-dhuang in #1650
- fix meta.yaml spacing by @sfc-gh-chu in #1663
- Fix Cortex provider for tests. by @sfc-gh-dkurokawa in #1666
- relax snowflake-ml-python version by @sfc-gh-chu in #1664
- Fix encoding issues in dashboard by @sfc-gh-chu in #1668
- fix async pace by @sfc-gh-pmardziel in #1654
- Fix
poetry
environment issues. by @sfc-gh-dkurokawa in #1670 - Create trulens-semconv package. by @sfc-gh-dkurokawa in #1669
- strip quotes from connection params by @sfc-gh-chu in #1673
Full Changelog: trulens-1.2.6...trulens-1.2.9
TruLens v1.2.6
What's Changed
- Allow Cortex provider to only take a connection object. by @sfc-gh-pdharmana in #1617
- Fix code example formatting in docs by @sfc-gh-jreini in #1610
- Fix old references to mae in GroundTruthAgreement feedback function by @sfc-gh-dhuang in #1622
- Add try on tag creation since its an enterprise feature by @sfc-gh-pdharmana in #1623
- Stop using
snowflake.snowpark.session.Session::sql
and usesnowflake.connector.cursor.SnowflakeCursor::execute
instead as its thread-safe. by @sfc-gh-dkurokawa in #1620 - logos section on homepage by @sfc-gh-jreini in #1602
- Don't check if Cortex providers can be deserialized for deferred feedback functions. by @sfc-gh-dkurokawa in #1626
- Ensure
make clean
does indeed clean or it will fail out. by @sfc-gh-dkurokawa in #1627 - Create smoke test for Snowflake notebooks. by @sfc-gh-dkurokawa in #1619
- For snowflake dialects, when inserting a feedback result with a NULL result, first insert a -1, then update it. by @sfc-gh-dkurokawa in #1628
- update link for migration page by @sfc-gh-jreini in #1630
- human feedback with metadata by @sfc-gh-pmardziel in #1629
- Move print message up to not invoke error when we can't create a tag. by @sfc-gh-dkurokawa in #1634
Full Changelog: trulens-1.2.4...trulens-1.2.6
TruLens v1.2.4
What's Changed
- try catch on tag creation (#1623) by @sfc-gh-pdharmana in #1624
Full Changelog: trulens-1.2.2...trulens-1.2.4
TruLens v1.2.2
What's Changed
- Use snowflake connector over snowpark session in trulens Snowflake DB connector as snowpark session isn't thread-safe. by @sfc-gh-dkurokawa in #1604
- Don't open extra Snowflake connections and don't recycle connections as quickly. by @sfc-gh-dkurokawa in #1609
- Remove unnecessary deps from
trulens-connectors-snowflake
. by @sfc-gh-dkurokawa in #1611
Full Changelog: trulens-1.2.1...trulens-1.2.2
TruLens v1.2.1
Bug Fixes
- Don't check for user and account in snowpark sessions because Streamlit apps might hide them. by @sfc-gh-dkurokawa in #1600
- catch source code not available in
code_line
by @sfc-gh-pmardziel in #1592 - use float nan in place of numpy for skipped evals by @sfc-gh-chu in #1595
- Fix the misspelled
trulens-providers-openai
package in examples by @SSK-14 in #1601 - fix assertion to nan by @sfc-gh-jreini in #1605
New Contributors
Full Changelog: trulens-1.2.0...trulens-1.2.1
TruLens v1.2.0
What's Changed
- Blocking guardrails by @sfc-gh-jreini in #1584
- and add dataset preprocessing utils used in benchmarking by @sfc-gh-dhuang in #1559
- Use ggshield for local secret scanning by @sfc-gh-jreini in #1585
- Clean before uploading docs. by @sfc-gh-dkurokawa in #1594
- Update dev guide with git lfs instructions by @sfc-gh-chu in #1597
Bug Fixes
- (some) release pipeline fixes by @sfc-gh-pmardziel in #1537
- bumping conda package build to 1.1.0 by @sfc-gh-srudenko in #1557
- Fix ground truth dataset persistence notebook after the ground truth search metrics update by @sfc-gh-dhuang in #1558
- import style by @sfc-gh-pmardziel in #1543
- warning and docpage for bad context by @sfc-gh-pmardziel in #1565
- test dummy endpoints by @sfc-gh-pmardziel in #1566
- fix docs for snowflake connection by @sfc-gh-srudenko in #1576
- Use conda channel trulens packages by default. by @sfc-gh-dkurokawa in #1570
- Fix 'reason not generated' by @dom7kim in #1561
- import rename listings by @sfc-gh-pmardziel in #1568
- Use
SnowflakeConnector
in stored proc. by @sfc-gh-dkurokawa in #1580 - Reuse Snowpark session during most tests. by @sfc-gh-dkurokawa in #1536
- Have
run_leaderboard
should fail more clearly if it's unable to authenticate at Snowflake due to being created by asnowpark_session
. by @sfc-gh-dkurokawa in #1581 - Add tags to schema during snowflake app creation by @sfc-gh-pdharmana in #1577
- Use proper golden set format. by @sfc-gh-dkurokawa in #1587
- Fix bad merge for snowflake connector. by @sfc-gh-dkurokawa in #1588
- Fix
poetry.lock
boto3
dependency hashes. by @sfc-gh-dkurokawa in #1590 - Always ensure endpoint context variable is cleaned up. by @sfc-gh-dkurokawa in #1589
- defaults for each contextvar by @sfc-gh-pmardziel in #1586
Examples
- Comparison notebook: TruLens groundedness vs RAGAS faithfulness by @sfc-gh-dhuang in #1559
- Add quickstarts to docs by @sfc-gh-jreini in #1583
Full Changelog: trulens-1.1.0...trulens-1.2.0
trulens-1.1.0
What's Changed
TruLens 1.1 has a ton of exciting changes - we've grouped the updates into the new features they support so you can jump straight to the updates you're most excited about:
- TruLens Dashboard
- Feedback Provider Support
- Search Metric Support
- Adding dataframes to TruLens
- OpenTelemetry Support
- Async and Streaming Support
- More Reliable Feedback Functions
- New Examples
- Docs Updates
- Bug Fixes
TruLens Dashboard
In TruLens 1.1, we re-imagined the dashboard with a focus on making it easy to track large numbers of experiments, make comparisons and improve your apps for production. We also made several improvements performance and usability including dark mode.
Read more about the new look dashboard.
See the changes:
- Dark mode for Trace viewer by @sfc-gh-gtokernliang in #1437
- Make styling more compatible with dark mode for feedback functions by @sfc-gh-gtokernliang in #1439
- Add missing UX components to streamlit feedback component by @sfc-gh-jreini in #1440
- Dashboard Enhancements by @sfc-gh-chu in #1443
- leaderboard list view fix by @sfc-gh-chu in #1491
- small perf improvements by @sfc-gh-chu in #1490
- fix to sql query bug in dashboard by @sfc-gh-pmardziel in #1531
- Fix leaderboard showing inconsistent latency readings by @sfc-gh-chu in #1522
Expanded Search Metric Support
TruLens now supports common information retrieval (search) metrics including IR Hit Rate, NDCG, Precision, Recall, Mean Reciprocal Rank and more. These new metrics are accessible as ground truth feedback functions and simply require the addition of expected_chunks
to your ground truth data. Try the example
See the change:
- Information retrieval (search) metrics computation with ground truth datasets - notebook + metrics implementation by @sfc-gh-dhuang in #1545
Getting started with existing data
It's now easier than ever to get started with TruLens. Starting with a dataframe with query
, response
and contexts
columns, you can load it to TruLens using add_dataframe
and easily run feedback functions against your data. Try it yourself
See the change:
add_dataframe
method + quickstart by @sfc-gh-jreini in #1474
Experimental support for Open Telemetry
We've added experimental preview support for Open Telemetry, enabled with session.experimental_enable_feature("otel_tracing")
. We are collecting feedback and will be continuing to improve the user experience for writing and reading OpenTelemetry traces. If you want to try it out, check it out with custom python or Llama-Index.
See the changes:
- OTEL import/export by @sfc-gh-pmardziel in #1485
- experimental flags by @sfc-gh-pmardziel in #1427
Restored Async and Streaming Support
- memory, threads, and async leakage testing by @sfc-gh-pmardziel in #1470
- fix async handling and other release pipeline failures by @sfc-gh-pmardziel in #1441
More reliable feedback functions
- Simplify system prompt generation conditions with output space and criteria by @sfc-gh-dhuang in #1554
- handle partial functions for feedback functions by @sfc-gh-chu in #1551
- More error handling for groundedness internal steps by @sfc-gh-jreini in #1549
- RAG triads llm as judges benchmark - adding meta-eval metrics for correlation measurement and experiment notebooks by @sfc-gh-dhuang in #1462
- Add option to filter trivial statements for groundedness measure by @sfc-gh-pdharmana in #1556
- Fix splitting key_points issue: generalize the solution for splitting key points in _assess_key_point_inclusion() by @dom7kim in #1519'
Feedback Provider Support
- Add mistral-large2 to the list of supported models in Cortex feedback provider by @sfc-gh-dhuang in #1496
- Claude 3 support for AWS Bedrock by @sfc-gh-chu in #1481
- Switch to llama 3.1 8b as default model in cortex by @sfc-gh-dhuang in #1500
- Support having a
Langchain
provider with aBaseLLM
and not justBaseChatModel
. by @sfc-gh-dkurokawa in #1459
New Examples
- Cortex Fine-tuning experiments notebook by @sfc-gh-jreini in #1453
- Cortex Chat Quickstart by @sfc-gh-jreini in #1446 and #1460
- Server side feedback computation + batch ingestion by @sfc-gh-jreini in #1464
- New Custom Streaming example by @sfc-gh-pmardziel in #1441
Docs Updates
- az badge update by @sfc-gh-chu in #1436
- docs nits by @sfc-gh-jreini in #1434
- Docs Changes by @sfc-gh-chu in #1473
- website analytics and dark mode fixes by @sfc-gh-chu in #1497
- Add blog site and docs grouping by @sfc-gh-chu in #1499
- Fix colab links by @sfc-gh-jreini in #1508
- Josh/center homepage image text + change app versions compared by @sfc-gh-jreini in #1442
- Fix homepage blog link by @sfc-gh-chu in #1535
Bug Fixes
- endpoint kwargs by @sfc-gh-chu in #1489
- Update threading.py, fix context loss in multi-threading by @glennfeys in #1478
- Fix Selector AttributeError by @sfc-gh-chu in #1553
- SQLAlchemy joinedload on record.app relationship by @sfc-gh-chu in #1524
- release pipeline related fixes by @sfc-gh-pmardziel in #1435
- fix trulens_eval migration link by @sfc-gh-pmardziel in #1448
- fix typo in Makefile by @sfc-gh-pmardziel in #1463
- Add progress bars to data migration scripts by @sfc-gh-chu in #1458
- cortex instrumentation fixes by @sfc-gh-chu in #1447
- Conda Meta Hash fix by @sfc-gh-srudenko in #1468
- fix optionals in core by @sfc-gh-pmardziel in #1471
- fix optional import message by @sfc-gh-chu in #1457
- Allow for DBs that already have tables (at least if they're not sqlite databases). by @sfc-gh-dkurokawa in #1449
- Bumping conda meta to build a conda package 1.0.2 by @sfc-gh-srudenko in #1479
- Move requests/Endpoint.post to huggingface provider by @sfc-gh-chu in #1476
- relax minor package version constraints by @sfc-gh-chu in #1482
- init_server_side=False by default by @sfc-gh-chu in #1483
- slight downgrade of minimum dep requirements by @sfc-gh-chu in #1504
- Cleanup main pyproject and relax minor versions by @sfc-gh-chu in #1494
- Updates to the query planning notebook by @sfc-gh-dhuang in #1512 and @sfc-gh-jreini in #1514
- Conda build meta changes by @sfc-gh-srudenko in #1503
- Dashboard fixes by @sfc-gh-jreini in #1518
- Bump the pip group across 1 directory with 2 updates by @dependabot in #1507
- small record ingest formatting fix by @sfc-gh-chu in #1515
- Set criteria for feedbacks correctly. by @sfc-gh-dkurokawa in #1526
- Allow using Snowflake Connector fo...