-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Rerank model landing page #2884
Elastic Rerank model landing page #2884
Conversation
A documentation preview will be available soon. Request a new doc build by commenting
If your PR continues to fail for an unknown reason, the doc build pipeline may be broken. Elastic employees can check the pipeline status here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing this up! I left a few suggestions, mostly nits to use attributes and to decrease word count. Please take or leave them.
Co-authored-by: István Zoltán Szabó <[email protected]>
|
||
It's important to note that if you rerank to depth `n` then you will need to run `n` inferences per query. This will include the document text and will therefore be significantly more expensive than inference for query embeddings. Hardware can be scaled to run these inferences in parallel, but we would recommend shallow reranking for CPU inference: no more than top-30 results. You may find that the preview version is cost prohibitive for high query rates and low query latency requirements. We plan to address performance issues for GA. | ||
|
||
// // Is air-gapped deployment supported? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes air gapped download is supported, the same instructions as ELSER apply just change the model id
https://www.elastic.co/guide/en/machine-learning/master/ml-nlp-elser.html#air-gapped-install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok cool thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle are those model artifact files already available?
found 'em :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we just have cross-platform version initially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is no platform specific model for rerank
Co-authored-by: David Kyle <[email protected]>
@davidkyle I updated with air-gapped instructions, copying the ELSER instructions but removing the trained model UI UI instructions and just replacing with "create inference endpoint". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+ | ||
-- | ||
``` | ||
xpack.ml.model_repository: file://${path.home}/config/models/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres an extra backtick at the end of this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, fixing!
+ | ||
When using the {ref}/semantic-text.html[`semantic_text` field type], text is divided into chunks. By default, each chunk contains 250 words (approximately 400 tokens). Be cautious when increasing the chunk size - if the combined length of your query and chunk text exceeds 512 tokens, the model won't have access to the full content. | ||
+ | ||
When the combined inputs exceed the 512 token limit, a balanced truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clear phrasing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because it's 98% @tveasey and @davidkyle 😉
(cherry picked from commit 65aa83a)
(cherry picked from commit 65aa83a)
(cherry picked from commit 65aa83a) Co-authored-by: Liam Thompson <[email protected]>
(cherry picked from commit 65aa83a) Co-authored-by: Liam Thompson <[email protected]>
URL preview 🔭 👁️
TODO