Skip to content

Commit

Permalink
[ML] Add ML limitation for ingesting large documents (#2877) (#2882)
Browse files Browse the repository at this point in the history
  • Loading branch information
maxhniebergall authored Nov 28, 2024
1 parent 5740148 commit 664405d
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion docs/en/stack/ml/nlp/ml-nlp-limitations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@
The following limitations and known problems apply to the {version} release of
the Elastic {nlp} trained models feature.

[discrete]
[[ml-nlp-large-documents-limit-10k-10mb]]
== Document size limitations when using `semantic_text` fields

When using semantic text to ingest documents, chunking takes place automatically. The number of chunks is limited by the {ref}/mapping-settings-limit.html[`index.mapping.nested_objects.limit`] cluster setting, which defaults to 10k. Documents that are too large will cause errors during ingestion. To avoid this issue, please split your documents into roughly 1MB parts before ingestion.

[discrete]
[[ml-nlp-elser-v1-limit-512]]
== ELSER semantic search is limited to 512 tokens per field that inference is applied to
Expand All @@ -17,4 +23,4 @@ When you use ELSER for semantic search, only the first 512 extracted tokens from
each field of the ingested documents that ELSER is applied to are taken into
account for the search process. If your data set contains long documents, divide
them into smaller segments before ingestion if you need the full text to be
searchable.
searchable.

0 comments on commit 664405d

Please sign in to comment.