From 047c69b4318598473c44626df542735e30ee445a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 18 Oct 2023 09:48:29 +0200 Subject: [PATCH 1/2] [DOCS] Adds section about tokens to ELSER conceptual (#2568) * [DOCS] Adds section about tokens to ELSER conceptual. * [DOCS] Adds 'discrete' flag to section. (cherry picked from commit f9c8a202863271910075fa77b3ae424d57d1df89) # Conflicts: # docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 23 ++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index faa5aabbe..0f1e43c4e 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -20,13 +20,36 @@ meaning and user intent, rather than exact keyword matches. ELSER is an out-of-domain model which means it does not require fine-tuning on your own data, making it adaptable for various use cases out of the box. + +[discrete] +[[elser-tokens]] +== Tokens - not synonyms + ELSER expands the indexed and searched passages into collections of terms that are learned to co-occur frequently within a diverse set of training data. The terms that the text is expanded into by the model _are not_ synonyms for the +<<<<<<< HEAD search terms; they are learned associations. These expanded terms are weighted as some of them are more significant than others. Then the {es} {ref}/rank-features.html[rank features field type] is used to store the terms and weights at index time, and to search against later. +======= +search terms; they are learned associations capturing relevance. These expanded +terms are weighted as some of them are more significant than others. Then the +{es} {ref}/sparse-vector.html[sparse vector] +(or {ref}/rank-features.html[rank features]) field type is used to store the +terms and weights at index time, and to search against later. +>>>>>>> f9c8a202 ([DOCS] Adds section about tokens to ELSER conceptual (#2568)) + +This approach provides a more understandable search experience compared to +vector embeddings. However, attempting to directly interpret the tokens and +weights can be misleading, as the expansion essentially results in a vector in a +very high-dimensional space. Consequently, certain tokens, especially those with +low weight, contain information that is intertwined with other low-weight tokens +in the representation. In this regard, they function similarly to a dense vector +representation, making it challenging to separate their individual +contributions. This complexity can potentially lead to misinterpretations if not +carefully considered during analysis. [discrete] From 0e2a6f02fcabe6460b9104ca94fd4f360eaa6917 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 18 Oct 2023 09:59:21 +0200 Subject: [PATCH 2/2] Update docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc --- docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc index 0f1e43c4e..c3e24bff6 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc @@ -28,18 +28,10 @@ your own data, making it adaptable for various use cases out of the box. ELSER expands the indexed and searched passages into collections of terms that are learned to co-occur frequently within a diverse set of training data. The terms that the text is expanded into by the model _are not_ synonyms for the -<<<<<<< HEAD -search terms; they are learned associations. These expanded terms are weighted -as some of them are more significant than others. Then the {es} -{ref}/rank-features.html[rank features field type] is used to store the terms -and weights at index time, and to search against later. -======= search terms; they are learned associations capturing relevance. These expanded terms are weighted as some of them are more significant than others. Then the -{es} {ref}/sparse-vector.html[sparse vector] -(or {ref}/rank-features.html[rank features]) field type is used to store the +{es} {ref}/rank-features.html[rank features] field type is used to store the terms and weights at index time, and to search against later. ->>>>>>> f9c8a202 ([DOCS] Adds section about tokens to ELSER conceptual (#2568)) This approach provides a more understandable search experience compared to vector embeddings. However, attempting to directly interpret the tokens and