Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Search #653

Merged
merged 3 commits into from
Oct 30, 2023
Merged

Hybrid Search #653

merged 3 commits into from
Oct 30, 2023

Conversation

yuhongsun96
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Oct 30, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 30, 2023 4:30am

@@ -615,7 +612,7 @@ def keyword_retrieval(
"query": final_query,
"input.query(decay_factor)": str(DOC_TIME_DECAY * decay_multiplier),
"hits": num_to_retrieve,
"num_to_rerank": 10 * num_to_retrieve,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never doing anything :/

@@ -640,7 +637,6 @@ def semantic_retrieval(
# needed for highlighting while the N-gram highlighting is broken /
# not working as desired
+ f'or ({{defaultIndex: "{CONTENT_SUMMARY}"}}userInput(@query)))'
+ _build_vespa_limit(num_to_retrieve)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as testing shows, adding it to yql vs parameters is the same, feels easier/cleaner to just have it in the params

)

params: dict[str, str | int] = {
"yql": yql,
"query": query,
"hits": num_to_retrieve,
"num_to_rerank": 10 * num_to_retrieve,
"offset": 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set all these as 0 just so we remember later how to do it if we decide to introduce pagination

@@ -21,7 +21,9 @@
DOC_EMBEDDING_DIM = 384
# Model should be chosen with 512 context size, ideally don't change this
DOC_EMBEDDING_CONTEXT_SIZE = 512
NORMALIZE_EMBEDDINGS = (os.environ.get("SKIP_RERANKING") or "False").lower() == "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops...

@yuhongsun96 yuhongsun96 merged commit 52c0d6e into main Oct 30, 2023
2 checks passed
@yuhongsun96 yuhongsun96 deleted the hybrid-search branch October 30, 2023 05:18
sidravi1 pushed a commit to IDinsight/danswer that referenced this pull request Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants