From f67a154a9c6282c8ac151374087cb7830439b1f5 Mon Sep 17 00:00:00 2001
From: "mergify[bot]" <37929162+mergify[bot]@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:00:00 +0200
Subject: [PATCH] [DOCS] Documents that models with one allocation might have
 downtime (#2567) (#2573)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
---
 docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc
index 67c57cf72..0ce4e40a1 100644
--- a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc
+++ b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc
@@ -198,6 +198,11 @@ nodes. Model allocations are independent units of work for NLP tasks. To
 influence model performance, you can configure the number of allocations and the 
 number of threads used by each allocation of your deployment.
 
+IMPORTANT: If your deployed trained model has only one allocation, it's likely 
+that you will experience downtime in the service your trained model performs. 
+You can reduce or eliminate downtime by adding more allocations to your trained 
+models.
+
 Throughput can be scaled by adding more allocations to the deployment; it 
 increases the number of {infer} requests that can be performed in parallel. All 
 allocations assigned to a node share the same copy of the model in memory. The