hybrid search example code (for docs) #30

morgangallant · 2024-06-18T15:51:26Z

No description provided.

Signed-off-by: Morgan Gallant <[email protected]>

sirupsen · 2024-06-19T12:38:24Z

examples/hybrid_search.py

+        if doc_id in bm25_ranks:
+            score += 1.0 / (k + bm25_ranks[doc_id])
+        if doc_id in vector_ranks:
+            score += 1.0 / (k + vector_ranks[doc_id])


Hm, is classic RRF only rank-based? Nothing about scores? Probably fine to begin with

sirupsen · 2024-06-19T12:39:02Z

examples/hybrid_search.py

@@ -0,0 +1,122 @@
+import turbopuffer as tpuf


Does this run as part of the test suite?

sirupsen

No major qualms here; just make sure it runs as part of the test suite (I left some comments about the code that references this)

sirupsen · 2024-06-19T12:42:37Z

examples/hybrid_search.py

+
+# Fuses two search result sets together into one
+# Uses reciprocal rank fusion
+def rank_fusion(bm25_results, vector_results, k=60):


I think we may want to inline/copy-paste this in a few places, so I think we need 1 top-level comments and then the function to be as small as possible, e.g. something alone the lines of this (not tested):

def results_to_ranks(results, reverse=False): return {item.id: rank for rank, item in enumerate(sorted(results, key=lambda item: item.dist, reverse=reverse), start=1)} def rank_fusion(bm25_results, vector_results, k=60): bm25_ranks, vector_ranks = results_to_ranks(bm25_results), results_to_ranks(vector_results, reverse=True) scores = {doc_id: (1 / (k + bm25_ranks[doc_id]) if doc_id in bm25_ranks else 0) + (1 / (k + vector_ranks[doc_id]) if doc_id in vector_ranks else 0) for doc_id in set(bm25_ranks) | set(vector_ranks)} return [{"id": doc_id, "score": score} for doc_id, score in sorted(scores.items(), key=lambda item: item[1], reverse=True)]

The more code you have to paste, the more you'll question why we don't build this into this library (good reasons). The shorter it is, the more you'll want to tweak it (good).

morgangallant · 2024-06-19T14:55:35Z

just make sure it runs as part of the test suite

I don't think any of the other examples run as part of tests, should we make an exception here? I'm not convinced, this is moreso documentation than anything else

Signed-off-by: Morgan Gallant <[email protected]>

hybrid search example code (for docs)

4004628

Signed-off-by: Morgan Gallant <[email protected]>

morgangallant requested a review from sirupsen June 18, 2024 15:51

sirupsen reviewed Jun 19, 2024

View reviewed changes

examples/hybrid_search.py

@@ -0,0 +1,122 @@

import turbopuffer as tpuf

Copy link

Contributor

sirupsen Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this run as part of the test suite?

sirupsen approved these changes Jun 19, 2024

View reviewed changes

sirupsen reviewed Jun 19, 2024

View reviewed changes

bit cleaner

f01f1b7

Signed-off-by: Morgan Gallant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hybrid search example code (for docs) #30

hybrid search example code (for docs) #30

morgangallant commented Jun 18, 2024

sirupsen Jun 19, 2024 •

edited

Loading

sirupsen Jun 19, 2024

sirupsen left a comment

sirupsen Jun 19, 2024 •

edited

Loading

morgangallant commented Jun 19, 2024

hybrid search example code (for docs) #30

Are you sure you want to change the base?

hybrid search example code (for docs) #30

Conversation

morgangallant commented Jun 18, 2024

sirupsen Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

sirupsen Jun 19, 2024

Choose a reason for hiding this comment

sirupsen left a comment

Choose a reason for hiding this comment

sirupsen Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

morgangallant commented Jun 19, 2024

sirupsen Jun 19, 2024 •

edited

Loading

sirupsen Jun 19, 2024 •

edited

Loading