From 7b38f8d169a163499b8c4b91be159a28d1878dfb Mon Sep 17 00:00:00 2001
From: "mergify[bot]" <37929162+mergify[bot]@users.noreply.github.com>
Date: Wed, 18 Oct 2023 10:33:23 +0200
Subject: [PATCH] [8.8] [DOCS] Adds section about tokens to ELSER conceptual
 (backport #2568) (#2572)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
---
 docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc | 23 ++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
index faa5aabbe..c3e24bff6 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
@@ -20,13 +20,28 @@ meaning and user intent, rather than exact keyword matches.
 ELSER is an out-of-domain model which means it does not require fine-tuning on 
 your own data, making it adaptable for various use cases out of the box.
 
+
+[discrete]
+[[elser-tokens]]
+== Tokens - not synonyms
+
 ELSER expands the indexed and searched passages into collections of terms that 
 are learned to co-occur frequently within a diverse set of training data. The 
 terms that the text is expanded into by the model _are not_ synonyms for the 
-search terms; they are learned associations. These expanded terms are weighted 
-as some of them are more significant than others. Then the {es} 
-{ref}/rank-features.html[rank features field type] is used to store the terms 
-and weights at index time, and to search against later. 
+search terms; they are learned associations capturing relevance. These expanded 
+terms are weighted as some of them are more significant than others. Then the 
+{es} {ref}/rank-features.html[rank features] field type is used to store the 
+terms and weights at index time, and to search against later.
+
+This approach provides a more understandable search experience compared to 
+vector embeddings. However, attempting to directly interpret the tokens and 
+weights can be misleading, as the expansion essentially results in a vector in a 
+very high-dimensional space. Consequently, certain tokens, especially those with 
+low weight, contain information that is intertwined with other low-weight tokens 
+in the representation. In this regard, they function similarly to a dense vector 
+representation, making it challenging to separate their individual 
+contributions. This complexity can potentially lead to misinterpretations if not 
+carefully considered during analysis.
 
 
 [discrete]