diff --git a/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md b/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md
new file mode 100644
index 00000000000000..f5b806ade6f499
--- /dev/null
+++ b/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md
@@ -0,0 +1,310 @@
+## Custom Integration of Pre-defined Models
+
+### Introduction
+
+After completing the vendors integration, the next step is to connect the vendor's models. To illustrate the entire connection process, we will use Xinference as an example to demonstrate a complete vendor integration.
+
+It is important to note that for custom models, each model connection requires a complete vendor credential.
+
+Unlike pre-defined models, a custom vendor integration always includes the following two parameters, which do not need to be defined in the vendor YAML file.
+
+![](images/index/image-3.png)
+
+As mentioned earlier, vendors do not need to implement validate_provider_credential. The runtime will automatically call the corresponding model layer's validate_credentials to validate the credentials based on the model type and name selected by the user.
+
+### Writing the Vendor YAML
+
+First, we need to identify the types of models supported by the vendor we are integrating.
+
+Currently supported model types are as follows:
+
+- `llm` Text Generation Models
+
+- `text_embedding` Text Embedding Models
+
+- `rerank` Rerank Models
+
+- `speech2text` Speech-to-Text
+
+- `tts` Text-to-Speech
+
+- `moderation` Moderation
+
+Xinference supports LLM, Text Embedding, and Rerank. So we will start by writing xinference.yaml.
+
+```yaml
+provider: xinference #Define the vendor identifier
+label: # Vendor display name, supports both en_US (English) and zh_Hans (Simplified Chinese). If zh_Hans is not set, it will use en_US by default.
+  en_US: Xorbits Inference
+icon_small: # Small icon, refer to other vendors' icons stored in the _assets directory within the vendor implementation directory; follows the same language policy as the label
+  en_US: icon_s_en.svg
+icon_large: # Large icon
+  en_US: icon_l_en.svg
+help: # Help information
+  title:
+    en_US: How to deploy Xinference
+    zh_Hans: 如何部署 Xinference
+  url:
+    en_US: https://github.com/xorbitsai/inference
+supported_model_types: # Supported model types. Xinference supports LLM, Text Embedding, and Rerank
+- llm
+- text-embedding
+- rerank
+configurate_methods: # Since Xinference is a locally deployed vendor with no predefined models, users need to deploy whatever models they need according to Xinference documentation. Thus, it only supports custom models.
+- customizable-model
+provider_credential_schema:
+  credential_form_schemas:
+```
+
+
+Then, we need to determine what credentials are required to define a model in Xinference.
+
+- Since it supports three different types of models, we need to specify the model_type to denote the model type. Here is how we can define it:
+
+```yaml
+provider_credential_schema:
+  credential_form_schemas:
+  - variable: model_type
+    type: select
+    label:
+      en_US: Model type
+      zh_Hans: 模型类型
+    required: true
+    options:
+    - value: text-generation
+      label:
+        en_US: Language Model
+        zh_Hans: 语言模型
+    - value: embeddings
+      label:
+        en_US: Text Embedding
+    - value: reranking
+      label:
+        en_US: Rerank
+```
+
+- Next, each model has its own model_name, so we need to define that here:
+
+```yaml
+  - variable: model_name
+    type: text-input
+    label:
+      en_US: Model name
+      zh_Hans: 模型名称
+    required: true
+    placeholder:
+      zh_Hans: 填写模型名称
+      en_US: Input model name
+```
+
+- Specify the Xinference local deployment address:
+
+```yaml
+  - variable: server_url
+    label:
+      zh_Hans: 服务器URL
+      en_US: Server url
+    type: text-input
+    required: true
+    placeholder:
+      zh_Hans: 在此输入Xinference的服务器地址，如 https://example.com/xxx
+      en_US: Enter the url of your Xinference, for example https://example.com/xxx
+```
+
+- Each model has a unique model_uid, so we also need to define that here:
+
+```yaml
+  - variable: model_uid
+    label:
+      zh_Hans: 模型UID
+      en_US: Model uid
+    type: text-input
+    required: true
+    placeholder:
+      zh_Hans: 在此输入您的Model UID
+      en_US: Enter the model uid
+```
+
+Now, we have completed the basic definition of the vendor.
+
+### Writing the Model Code
+
+Next, let's take the `llm` type as an example and write `xinference.llm.llm.py`.
+
+In `llm.py`, create a Xinference LLM class, we name it `XinferenceAILargeLanguageModel` (this can be arbitrary), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods:
+
+- LLM Invocation
+
+Implement the core method for LLM invocation, supporting both stream and synchronous responses.
+
+```python
+def _invoke(self, model: str, credentials: dict,
+            prompt_messages: list[PromptMessage], model_parameters: dict,
+            tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
+            stream: bool = True, user: Optional[str] = None) \
+        -> Union[LLMResult, Generator]:
+    """
+    Invoke large language model
+    
+    :param model: model name
+	:param credentials: model credentials
+	:param prompt_messages: prompt messages
+	:param model_parameters: model parameters
+	:param tools: tools for tool usage
+	:param stop: stop words
+	:param stream: is the response a stream
+	:param user: unique user id
+	:return: full response or stream response chunk generator result
+	"""
+```
+
+When implementing, ensure to use two functions to return data separately for synchronous and stream responses. This is important because Python treats functions containing the `yield` keyword as generator functions, mandating them to return `Generator` types. Here’s an example (note that the example uses simplified parameters; in real implementation, use the parameter list as defined above):
+
+```python
+def _invoke(self, stream: bool, **kwargs) \
+        -> Union[LLMResult, Generator]:
+    if stream:
+          return self._handle_stream_response(**kwargs)
+    return self._handle_sync_response(**kwargs)
+
+def _handle_stream_response(self, **kwargs) -> Generator:
+    for chunk in response:
+          yield chunk
+def _handle_sync_response(self, **kwargs) -> LLMResult:
+    return LLMResult(**response)
+```
+
+- Pre-compute Input Tokens
+
+If the model does not provide an interface for pre-computing tokens, you can return 0 directly.
+
+```python
+def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],tools: Optional[list[PromptMessageTool]] = None) -> int:
+  """
+  Get number of tokens for given prompt messages
+
+  :param model: model name
+  :param credentials: model credentials
+  :param prompt_messages: prompt messages
+  :param tools: tools for tool usage
+  :return: token count
+  """
+```
+
+
+Sometimes, you might not want to return 0 directly. In such cases, you can use `self._get_num_tokens_by_gpt2(text: str)` to get pre-computed tokens. This method is provided by the `AIModel` base class, and it uses GPT2's Tokenizer for calculation. However, it should be noted that this is only a substitute and may not be fully accurate.
+
+- Model Credentials Validation
+
+Similar to vendor credentials validation, this method validates individual model credentials.
+
+```python
+def validate_credentials(self, model: str, credentials: dict) -> None:
+    """
+    Validate model credentials
+    
+    :param model: model name
+	:param credentials: model credentials
+	:return: None
+	"""
+```
+
+- Model Parameter Schema
+
+Unlike custom types, since the YAML file does not define which parameters a model supports, we need to dynamically generate the model parameter schema.
+
+For instance, Xinference supports `max_tokens`, `temperature`, and `top_p` parameters.
+
+However, some vendors may support different parameters for different models. For example, the `OpenLLM` vendor supports `top_k`, but not all models provided by this vendor support `top_k`. Let's say model A supports `top_k` but model B does not. In such cases, we need to dynamically generate the model parameter schema, as illustrated below:
+
+```python
+    def get_customizable_model_schema(self, model: str, credentials: dict) -> AIModelEntity | None:
+        """
+            used to define customizable model schema
+        """
+        rules = [
+            ParameterRule(
+                name='temperature', type=ParameterType.FLOAT,
+                use_template='temperature',
+                label=I18nObject(
+                    zh_Hans='温度', en_US='Temperature'
+                )
+            ),
+            ParameterRule(
+                name='top_p', type=ParameterType.FLOAT,
+                use_template='top_p',
+                label=I18nObject(
+                    zh_Hans='Top P', en_US='Top P'
+                )
+            ),
+            ParameterRule(
+                name='max_tokens', type=ParameterType.INT,
+                use_template='max_tokens',
+                min=1,
+                default=512,
+                label=I18nObject(
+                    zh_Hans='最大生成长度', en_US='Max Tokens'
+                )
+            )
+        ]
+
+        # if model is A, add top_k to rules
+        if model == 'A':
+            rules.append(
+                ParameterRule(
+                    name='top_k', type=ParameterType.INT,
+                    use_template='top_k',
+                    min=1,
+                    default=50,
+                    label=I18nObject(
+                        zh_Hans='Top K', en_US='Top K'
+                    )
+                )
+            )
+
+        """
+            some NOT IMPORTANT code here
+        """
+
+        entity = AIModelEntity(
+            model=model,
+            label=I18nObject(
+                en_US=model
+            ),
+            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
+            model_type=model_type,
+            model_properties={ 
+                ModelPropertyKey.MODE:  ModelType.LLM,
+            },
+            parameter_rules=rules
+        )
+
+        return entity
+```
+
+- Exception Error Mapping
+
+When a model invocation error occurs, it should be mapped to the runtime's specified `InvokeError` type, enabling Dify to handle different errors appropriately.
+
+Runtime Errors:
+
+- `InvokeConnectionError` Connection error during invocation
+- `InvokeServerUnavailableError` Service provider unavailable
+- `InvokeRateLimitError` Rate limit reached
+- `InvokeAuthorizationError` Authorization failure
+- `InvokeBadRequestError` Invalid request parameters
+
+```python
+  @property
+  def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
+      """
+      Map model invoke error to unified error
+      The key is the error type thrown to the caller
+      The value is the error type thrown by the model,
+      which needs to be converted into a unified error type for the caller.
+  
+      :return: Invoke error mapping
+      """
+```
+
+For interface method details, see: [Interfaces](./interfaces.md). For specific implementations, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py).
\ No newline at end of file
diff --git a/api/core/model_runtime/docs/en_US/images/index/image-1.png b/api/core/model_runtime/docs/en_US/images/index/image-1.png
new file mode 100644
index 00000000000000..b158d44b29dcc2
Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-1.png differ
diff --git a/api/core/model_runtime/docs/en_US/images/index/image-2.png b/api/core/model_runtime/docs/en_US/images/index/image-2.png
new file mode 100644
index 00000000000000..c70cd3da5eea19
Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-2.png differ
diff --git a/api/core/model_runtime/docs/en_US/images/index/image-3.png b/api/core/model_runtime/docs/en_US/images/index/image-3.png
new file mode 100644
index 00000000000000..bf0b9a7f47fddf
Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-3.png differ
diff --git a/api/core/model_runtime/docs/en_US/images/index/image.png b/api/core/model_runtime/docs/en_US/images/index/image.png
new file mode 100644
index 00000000000000..eb63d107e1c385
Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image.png differ
diff --git a/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md b/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md
new file mode 100644
index 00000000000000..3e16257452c7a0
--- /dev/null
+++ b/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md
@@ -0,0 +1,173 @@
+## Predefined Model Integration
+
+After completing the vendor integration, the next step is to integrate the models from the vendor.
+
+First, we need to determine the type of model to be integrated and create the corresponding model type `module` under the respective vendor's directory.
+
+Currently supported model types are:
+
+- `llm` Text Generation Model
+- `text_embedding` Text Embedding Model
+- `rerank` Rerank Model
+- `speech2text` Speech-to-Text
+- `tts` Text-to-Speech
+- `moderation` Moderation
+
+Continuing with `Anthropic` as an example, `Anthropic` only supports LLM, so create a `module` named `llm` under `model_providers.anthropic`.
+
+For predefined models, we first need to create a YAML file named after the model under the `llm` `module`, such as `claude-2.1.yaml`.
+
+### Prepare Model YAML
+
+```yaml
+model: claude-2.1  # Model identifier
+# Display name of the model, which can be set to en_US English or zh_Hans Chinese. If zh_Hans is not set, it will default to en_US.
+# This can also be omitted, in which case the model identifier will be used as the label
+label:
+  en_US: claude-2.1
+model_type: llm  # Model type, claude-2.1 is an LLM
+features:  # Supported features, agent-thought supports Agent reasoning, vision supports image understanding
+- agent-thought
+model_properties:  # Model properties
+  mode: chat  # LLM mode, complete for text completion models, chat for conversation models
+  context_size: 200000  # Maximum context size
+parameter_rules:  # Parameter rules for the model call; only LLM requires this
+- name: temperature  # Parameter variable name
+  # Five default configuration templates are provided: temperature/top_p/max_tokens/presence_penalty/frequency_penalty
+  # The template variable name can be set directly in use_template, which will use the default configuration in entities.defaults.PARAMETER_RULE_TEMPLATE
+  # Additional configuration parameters will override the default configuration if set
+  use_template: temperature
+- name: top_p
+  use_template: top_p
+- name: top_k
+  label:  # Display name of the parameter
+    zh_Hans: 取样数量
+    en_US: Top k
+  type: int  # Parameter type, supports float/int/string/boolean
+  help:  # Help information, describing the parameter's function
+    zh_Hans: 仅从每个后续标记的前 K 个选项中采样。
+    en_US: Only sample from the top K options for each subsequent token.
+  required: false  # Whether the parameter is mandatory; can be omitted
+- name: max_tokens_to_sample
+  use_template: max_tokens
+  default: 4096  # Default value of the parameter
+  min: 1  # Minimum value of the parameter, applicable to float/int only
+  max: 4096  # Maximum value of the parameter, applicable to float/int only
+pricing:  # Pricing information
+  input: '8.00'  # Input unit price, i.e., prompt price
+  output: '24.00'  # Output unit price, i.e., response content price
+  unit: '0.000001'  # Price unit, meaning the above prices are per 100K
+  currency: USD  # Price currency
+```
+
+It is recommended to prepare all model configurations before starting the implementation of the model code.
+
+You can also refer to the YAML configuration information under the corresponding model type directories of other vendors in the `model_providers` directory. For the complete YAML rules, refer to: [Schema](schema.md#aimodelentity).
+
+### Implement the Model Call Code
+
+Next, create a Python file named `llm.py` under the `llm` `module` to write the implementation code.
+
+Create an Anthropic LLM class named `AnthropicLargeLanguageModel` (or any other name), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods:
+
+- LLM Call
+
+Implement the core method for calling the LLM, supporting both streaming and synchronous responses.
+
+```python
+  def _invoke(self, model: str, credentials: dict,
+              prompt_messages: list[PromptMessage], model_parameters: dict,
+              tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
+              stream: bool = True, user: Optional[str] = None) \
+          -> Union[LLMResult, Generator]:
+      """
+      Invoke large language model
+  
+      :param model: model name
+      :param credentials: model credentials
+      :param prompt_messages: prompt messages
+      :param model_parameters: model parameters
+      :param tools: tools for tool calling
+      :param stop: stop words
+      :param stream: is stream response
+      :param user: unique user id
+      :return: full response or stream response chunk generator result
+      """
+```
+
+Ensure to use two functions for returning data, one for synchronous returns and the other for streaming returns, because Python identifies functions containing the `yield` keyword as generator functions, fixing the return type to `Generator`. Thus, synchronous and streaming returns need to be implemented separately, as shown below (note that the example uses simplified parameters, for actual implementation follow the above parameter list):
+
+```python
+  def _invoke(self, stream: bool, **kwargs) \
+          -> Union[LLMResult, Generator]:
+      if stream:
+            return self._handle_stream_response(**kwargs)
+      return self._handle_sync_response(**kwargs)
+
+  def _handle_stream_response(self, **kwargs) -> Generator:
+      for chunk in response:
+            yield chunk
+  def _handle_sync_response(self, **kwargs) -> LLMResult:
+      return LLMResult(**response)
+```
+
+- Pre-compute Input Tokens
+
+If the model does not provide an interface to precompute tokens, return 0 directly.
+
+```python
+  def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],
+                     tools: Optional[list[PromptMessageTool]] = None) -> int:
+      """
+      Get number of tokens for given prompt messages
+
+      :param model: model name
+      :param credentials: model credentials
+      :param prompt_messages: prompt messages
+      :param tools: tools for tool calling
+      :return:
+      """
+```
+
+- Validate Model Credentials
+
+Similar to vendor credential validation, but specific to a single model.
+
+```python
+  def validate_credentials(self, model: str, credentials: dict) -> None:
+      """
+      Validate model credentials
+  
+      :param model: model name
+      :param credentials: model credentials
+      :return:
+      """
+```
+
+- Map Invoke Errors
+
+When a model call fails, map it to a specific `InvokeError` type as required by Runtime, allowing Dify to handle different errors accordingly.
+
+Runtime Errors:
+
+- `InvokeConnectionError` Connection error
+
+- `InvokeServerUnavailableError` Service provider unavailable
+- `InvokeRateLimitError` Rate limit reached
+- `InvokeAuthorizationError` Authorization failed
+- `InvokeBadRequestError` Parameter error
+
+```python
+  @property
+  def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
+      """
+      Map model invoke error to unified error
+      The key is the error type thrown to the caller
+      The value is the error type thrown by the model,
+      which needs to be converted into a unified error type for the caller.
+  
+      :return: Invoke error mapping
+      """
+```
+
+For interface method explanations, see: [Interfaces](./interfaces.md). For detailed implementation, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py).
\ No newline at end of file
diff --git a/api/core/model_runtime/docs/en_US/provider_scale_out.md b/api/core/model_runtime/docs/en_US/provider_scale_out.md
index ba356c5cab63d0..07be5811d30137 100644
--- a/api/core/model_runtime/docs/en_US/provider_scale_out.md
+++ b/api/core/model_runtime/docs/en_US/provider_scale_out.md
@@ -58,7 +58,7 @@ provider_credential_schema:  # Provider credential rules, as Anthropic only supp
       en_US: Enter your API URL
 ```
 
-You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#Provider).
+You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#provider).
 
 ### Implementing Provider Code
 
diff --git a/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md b/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md
index b34544c789fa76..78aad8876f4b84 100644
--- a/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md
+++ b/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md
@@ -117,7 +117,7 @@ model_credential_schema:
       en_US: Enter your API Base
 ```
 
-也可以参考  `model_providers` 目录下其他供应商目录下的 YAML 配置信息，完整的 YAML 规则见：[Schema](schema.md#Provider)。
+也可以参考  `model_providers` 目录下其他供应商目录下的 YAML 配置信息，完整的 YAML 规则见：[Schema](schema.md#provider)。
 
 #### 实现供应商代码