diff --git a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc index 67c57cf72..516efad7c 100644 --- a/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc +++ b/docs/en/stack/ml/nlp/ml-nlp-deploy-models.asciidoc @@ -198,6 +198,11 @@ nodes. Model allocations are independent units of work for NLP tasks. To influence model performance, you can configure the number of allocations and the number of threads used by each allocation of your deployment. +IMPORTANT: If your deployed trained model only has one allocation, it's very +likely that you will experience some downtime in the service your trained model +performs. You can reduce or eliminate downtime by adding more allocations to +your trained models. + Throughput can be scaled by adding more allocations to the deployment; it increases the number of {infer} requests that can be performed in parallel. All allocations assigned to a node share the same copy of the model in memory. The