diff --git a/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md b/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md new file mode 100644 index 00000000000000..f5b806ade6f499 --- /dev/null +++ b/api/core/model_runtime/docs/en_US/customizable_model_scale_out.md @@ -0,0 +1,310 @@ +## Custom Integration of Pre-defined Models + +### Introduction + +After completing the vendors integration, the next step is to connect the vendor's models. To illustrate the entire connection process, we will use Xinference as an example to demonstrate a complete vendor integration. + +It is important to note that for custom models, each model connection requires a complete vendor credential. + +Unlike pre-defined models, a custom vendor integration always includes the following two parameters, which do not need to be defined in the vendor YAML file. + +![](images/index/image-3.png) + +As mentioned earlier, vendors do not need to implement validate_provider_credential. The runtime will automatically call the corresponding model layer's validate_credentials to validate the credentials based on the model type and name selected by the user. + +### Writing the Vendor YAML + +First, we need to identify the types of models supported by the vendor we are integrating. + +Currently supported model types are as follows: + +- `llm` Text Generation Models + +- `text_embedding` Text Embedding Models + +- `rerank` Rerank Models + +- `speech2text` Speech-to-Text + +- `tts` Text-to-Speech + +- `moderation` Moderation + +Xinference supports LLM, Text Embedding, and Rerank. So we will start by writing xinference.yaml. + +```yaml +provider: xinference #Define the vendor identifier +label: # Vendor display name, supports both en_US (English) and zh_Hans (Simplified Chinese). If zh_Hans is not set, it will use en_US by default. + en_US: Xorbits Inference +icon_small: # Small icon, refer to other vendors' icons stored in the _assets directory within the vendor implementation directory; follows the same language policy as the label + en_US: icon_s_en.svg +icon_large: # Large icon + en_US: icon_l_en.svg +help: # Help information + title: + en_US: How to deploy Xinference + zh_Hans: 如何部署 Xinference + url: + en_US: https://github.com/xorbitsai/inference +supported_model_types: # Supported model types. Xinference supports LLM, Text Embedding, and Rerank +- llm +- text-embedding +- rerank +configurate_methods: # Since Xinference is a locally deployed vendor with no predefined models, users need to deploy whatever models they need according to Xinference documentation. Thus, it only supports custom models. +- customizable-model +provider_credential_schema: + credential_form_schemas: +``` + + +Then, we need to determine what credentials are required to define a model in Xinference. + +- Since it supports three different types of models, we need to specify the model_type to denote the model type. Here is how we can define it: + +```yaml +provider_credential_schema: + credential_form_schemas: + - variable: model_type + type: select + label: + en_US: Model type + zh_Hans: 模型类型 + required: true + options: + - value: text-generation + label: + en_US: Language Model + zh_Hans: 语言模型 + - value: embeddings + label: + en_US: Text Embedding + - value: reranking + label: + en_US: Rerank +``` + +- Next, each model has its own model_name, so we need to define that here: + +```yaml + - variable: model_name + type: text-input + label: + en_US: Model name + zh_Hans: 模型名称 + required: true + placeholder: + zh_Hans: 填写模型名称 + en_US: Input model name +``` + +- Specify the Xinference local deployment address: + +```yaml + - variable: server_url + label: + zh_Hans: 服务器URL + en_US: Server url + type: text-input + required: true + placeholder: + zh_Hans: 在此输入Xinference的服务器地址,如 https://example.com/xxx + en_US: Enter the url of your Xinference, for example https://example.com/xxx +``` + +- Each model has a unique model_uid, so we also need to define that here: + +```yaml + - variable: model_uid + label: + zh_Hans: 模型UID + en_US: Model uid + type: text-input + required: true + placeholder: + zh_Hans: 在此输入您的Model UID + en_US: Enter the model uid +``` + +Now, we have completed the basic definition of the vendor. + +### Writing the Model Code + +Next, let's take the `llm` type as an example and write `xinference.llm.llm.py`. + +In `llm.py`, create a Xinference LLM class, we name it `XinferenceAILargeLanguageModel` (this can be arbitrary), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods: + +- LLM Invocation + +Implement the core method for LLM invocation, supporting both stream and synchronous responses. + +```python +def _invoke(self, model: str, credentials: dict, + prompt_messages: list[PromptMessage], model_parameters: dict, + tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None, + stream: bool = True, user: Optional[str] = None) \ + -> Union[LLMResult, Generator]: + """ + Invoke large language model + + :param model: model name + :param credentials: model credentials + :param prompt_messages: prompt messages + :param model_parameters: model parameters + :param tools: tools for tool usage + :param stop: stop words + :param stream: is the response a stream + :param user: unique user id + :return: full response or stream response chunk generator result + """ +``` + +When implementing, ensure to use two functions to return data separately for synchronous and stream responses. This is important because Python treats functions containing the `yield` keyword as generator functions, mandating them to return `Generator` types. Here’s an example (note that the example uses simplified parameters; in real implementation, use the parameter list as defined above): + +```python +def _invoke(self, stream: bool, **kwargs) \ + -> Union[LLMResult, Generator]: + if stream: + return self._handle_stream_response(**kwargs) + return self._handle_sync_response(**kwargs) + +def _handle_stream_response(self, **kwargs) -> Generator: + for chunk in response: + yield chunk +def _handle_sync_response(self, **kwargs) -> LLMResult: + return LLMResult(**response) +``` + +- Pre-compute Input Tokens + +If the model does not provide an interface for pre-computing tokens, you can return 0 directly. + +```python +def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],tools: Optional[list[PromptMessageTool]] = None) -> int: + """ + Get number of tokens for given prompt messages + + :param model: model name + :param credentials: model credentials + :param prompt_messages: prompt messages + :param tools: tools for tool usage + :return: token count + """ +``` + + +Sometimes, you might not want to return 0 directly. In such cases, you can use `self._get_num_tokens_by_gpt2(text: str)` to get pre-computed tokens. This method is provided by the `AIModel` base class, and it uses GPT2's Tokenizer for calculation. However, it should be noted that this is only a substitute and may not be fully accurate. + +- Model Credentials Validation + +Similar to vendor credentials validation, this method validates individual model credentials. + +```python +def validate_credentials(self, model: str, credentials: dict) -> None: + """ + Validate model credentials + + :param model: model name + :param credentials: model credentials + :return: None + """ +``` + +- Model Parameter Schema + +Unlike custom types, since the YAML file does not define which parameters a model supports, we need to dynamically generate the model parameter schema. + +For instance, Xinference supports `max_tokens`, `temperature`, and `top_p` parameters. + +However, some vendors may support different parameters for different models. For example, the `OpenLLM` vendor supports `top_k`, but not all models provided by this vendor support `top_k`. Let's say model A supports `top_k` but model B does not. In such cases, we need to dynamically generate the model parameter schema, as illustrated below: + +```python + def get_customizable_model_schema(self, model: str, credentials: dict) -> AIModelEntity | None: + """ + used to define customizable model schema + """ + rules = [ + ParameterRule( + name='temperature', type=ParameterType.FLOAT, + use_template='temperature', + label=I18nObject( + zh_Hans='温度', en_US='Temperature' + ) + ), + ParameterRule( + name='top_p', type=ParameterType.FLOAT, + use_template='top_p', + label=I18nObject( + zh_Hans='Top P', en_US='Top P' + ) + ), + ParameterRule( + name='max_tokens', type=ParameterType.INT, + use_template='max_tokens', + min=1, + default=512, + label=I18nObject( + zh_Hans='最大生成长度', en_US='Max Tokens' + ) + ) + ] + + # if model is A, add top_k to rules + if model == 'A': + rules.append( + ParameterRule( + name='top_k', type=ParameterType.INT, + use_template='top_k', + min=1, + default=50, + label=I18nObject( + zh_Hans='Top K', en_US='Top K' + ) + ) + ) + + """ + some NOT IMPORTANT code here + """ + + entity = AIModelEntity( + model=model, + label=I18nObject( + en_US=model + ), + fetch_from=FetchFrom.CUSTOMIZABLE_MODEL, + model_type=model_type, + model_properties={ + ModelPropertyKey.MODE: ModelType.LLM, + }, + parameter_rules=rules + ) + + return entity +``` + +- Exception Error Mapping + +When a model invocation error occurs, it should be mapped to the runtime's specified `InvokeError` type, enabling Dify to handle different errors appropriately. + +Runtime Errors: + +- `InvokeConnectionError` Connection error during invocation +- `InvokeServerUnavailableError` Service provider unavailable +- `InvokeRateLimitError` Rate limit reached +- `InvokeAuthorizationError` Authorization failure +- `InvokeBadRequestError` Invalid request parameters + +```python + @property + def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]: + """ + Map model invoke error to unified error + The key is the error type thrown to the caller + The value is the error type thrown by the model, + which needs to be converted into a unified error type for the caller. + + :return: Invoke error mapping + """ +``` + +For interface method details, see: [Interfaces](./interfaces.md). For specific implementations, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py). \ No newline at end of file diff --git a/api/core/model_runtime/docs/en_US/images/index/image-1.png b/api/core/model_runtime/docs/en_US/images/index/image-1.png new file mode 100644 index 00000000000000..b158d44b29dcc2 Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-1.png differ diff --git a/api/core/model_runtime/docs/en_US/images/index/image-2.png b/api/core/model_runtime/docs/en_US/images/index/image-2.png new file mode 100644 index 00000000000000..c70cd3da5eea19 Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-2.png differ diff --git a/api/core/model_runtime/docs/en_US/images/index/image-3.png b/api/core/model_runtime/docs/en_US/images/index/image-3.png new file mode 100644 index 00000000000000..bf0b9a7f47fddf Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image-3.png differ diff --git a/api/core/model_runtime/docs/en_US/images/index/image.png b/api/core/model_runtime/docs/en_US/images/index/image.png new file mode 100644 index 00000000000000..eb63d107e1c385 Binary files /dev/null and b/api/core/model_runtime/docs/en_US/images/index/image.png differ diff --git a/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md b/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md new file mode 100644 index 00000000000000..3e16257452c7a0 --- /dev/null +++ b/api/core/model_runtime/docs/en_US/predefined_model_scale_out.md @@ -0,0 +1,173 @@ +## Predefined Model Integration + +After completing the vendor integration, the next step is to integrate the models from the vendor. + +First, we need to determine the type of model to be integrated and create the corresponding model type `module` under the respective vendor's directory. + +Currently supported model types are: + +- `llm` Text Generation Model +- `text_embedding` Text Embedding Model +- `rerank` Rerank Model +- `speech2text` Speech-to-Text +- `tts` Text-to-Speech +- `moderation` Moderation + +Continuing with `Anthropic` as an example, `Anthropic` only supports LLM, so create a `module` named `llm` under `model_providers.anthropic`. + +For predefined models, we first need to create a YAML file named after the model under the `llm` `module`, such as `claude-2.1.yaml`. + +### Prepare Model YAML + +```yaml +model: claude-2.1 # Model identifier +# Display name of the model, which can be set to en_US English or zh_Hans Chinese. If zh_Hans is not set, it will default to en_US. +# This can also be omitted, in which case the model identifier will be used as the label +label: + en_US: claude-2.1 +model_type: llm # Model type, claude-2.1 is an LLM +features: # Supported features, agent-thought supports Agent reasoning, vision supports image understanding +- agent-thought +model_properties: # Model properties + mode: chat # LLM mode, complete for text completion models, chat for conversation models + context_size: 200000 # Maximum context size +parameter_rules: # Parameter rules for the model call; only LLM requires this +- name: temperature # Parameter variable name + # Five default configuration templates are provided: temperature/top_p/max_tokens/presence_penalty/frequency_penalty + # The template variable name can be set directly in use_template, which will use the default configuration in entities.defaults.PARAMETER_RULE_TEMPLATE + # Additional configuration parameters will override the default configuration if set + use_template: temperature +- name: top_p + use_template: top_p +- name: top_k + label: # Display name of the parameter + zh_Hans: 取样数量 + en_US: Top k + type: int # Parameter type, supports float/int/string/boolean + help: # Help information, describing the parameter's function + zh_Hans: 仅从每个后续标记的前 K 个选项中采样。 + en_US: Only sample from the top K options for each subsequent token. + required: false # Whether the parameter is mandatory; can be omitted +- name: max_tokens_to_sample + use_template: max_tokens + default: 4096 # Default value of the parameter + min: 1 # Minimum value of the parameter, applicable to float/int only + max: 4096 # Maximum value of the parameter, applicable to float/int only +pricing: # Pricing information + input: '8.00' # Input unit price, i.e., prompt price + output: '24.00' # Output unit price, i.e., response content price + unit: '0.000001' # Price unit, meaning the above prices are per 100K + currency: USD # Price currency +``` + +It is recommended to prepare all model configurations before starting the implementation of the model code. + +You can also refer to the YAML configuration information under the corresponding model type directories of other vendors in the `model_providers` directory. For the complete YAML rules, refer to: [Schema](schema.md#aimodelentity). + +### Implement the Model Call Code + +Next, create a Python file named `llm.py` under the `llm` `module` to write the implementation code. + +Create an Anthropic LLM class named `AnthropicLargeLanguageModel` (or any other name), inheriting from the `__base.large_language_model.LargeLanguageModel` base class, and implement the following methods: + +- LLM Call + +Implement the core method for calling the LLM, supporting both streaming and synchronous responses. + +```python + def _invoke(self, model: str, credentials: dict, + prompt_messages: list[PromptMessage], model_parameters: dict, + tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None, + stream: bool = True, user: Optional[str] = None) \ + -> Union[LLMResult, Generator]: + """ + Invoke large language model + + :param model: model name + :param credentials: model credentials + :param prompt_messages: prompt messages + :param model_parameters: model parameters + :param tools: tools for tool calling + :param stop: stop words + :param stream: is stream response + :param user: unique user id + :return: full response or stream response chunk generator result + """ +``` + +Ensure to use two functions for returning data, one for synchronous returns and the other for streaming returns, because Python identifies functions containing the `yield` keyword as generator functions, fixing the return type to `Generator`. Thus, synchronous and streaming returns need to be implemented separately, as shown below (note that the example uses simplified parameters, for actual implementation follow the above parameter list): + +```python + def _invoke(self, stream: bool, **kwargs) \ + -> Union[LLMResult, Generator]: + if stream: + return self._handle_stream_response(**kwargs) + return self._handle_sync_response(**kwargs) + + def _handle_stream_response(self, **kwargs) -> Generator: + for chunk in response: + yield chunk + def _handle_sync_response(self, **kwargs) -> LLMResult: + return LLMResult(**response) +``` + +- Pre-compute Input Tokens + +If the model does not provide an interface to precompute tokens, return 0 directly. + +```python + def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage], + tools: Optional[list[PromptMessageTool]] = None) -> int: + """ + Get number of tokens for given prompt messages + + :param model: model name + :param credentials: model credentials + :param prompt_messages: prompt messages + :param tools: tools for tool calling + :return: + """ +``` + +- Validate Model Credentials + +Similar to vendor credential validation, but specific to a single model. + +```python + def validate_credentials(self, model: str, credentials: dict) -> None: + """ + Validate model credentials + + :param model: model name + :param credentials: model credentials + :return: + """ +``` + +- Map Invoke Errors + +When a model call fails, map it to a specific `InvokeError` type as required by Runtime, allowing Dify to handle different errors accordingly. + +Runtime Errors: + +- `InvokeConnectionError` Connection error + +- `InvokeServerUnavailableError` Service provider unavailable +- `InvokeRateLimitError` Rate limit reached +- `InvokeAuthorizationError` Authorization failed +- `InvokeBadRequestError` Parameter error + +```python + @property + def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]: + """ + Map model invoke error to unified error + The key is the error type thrown to the caller + The value is the error type thrown by the model, + which needs to be converted into a unified error type for the caller. + + :return: Invoke error mapping + """ +``` + +For interface method explanations, see: [Interfaces](./interfaces.md). For detailed implementation, refer to: [llm.py](https://github.com/langgenius/dify-runtime/blob/main/lib/model_providers/anthropic/llm/llm.py). \ No newline at end of file diff --git a/api/core/model_runtime/docs/en_US/provider_scale_out.md b/api/core/model_runtime/docs/en_US/provider_scale_out.md index ba356c5cab63d0..07be5811d30137 100644 --- a/api/core/model_runtime/docs/en_US/provider_scale_out.md +++ b/api/core/model_runtime/docs/en_US/provider_scale_out.md @@ -58,7 +58,7 @@ provider_credential_schema: # Provider credential rules, as Anthropic only supp en_US: Enter your API URL ``` -You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#Provider). +You can also refer to the YAML configuration information under other provider directories in `model_providers`. The complete YAML rules are available at: [Schema](schema.md#provider). ### Implementing Provider Code diff --git a/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md b/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md index b34544c789fa76..78aad8876f4b84 100644 --- a/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md +++ b/api/core/model_runtime/docs/zh_Hans/provider_scale_out.md @@ -117,7 +117,7 @@ model_credential_schema: en_US: Enter your API Base ``` -也可以参考 `model_providers` 目录下其他供应商目录下的 YAML 配置信息,完整的 YAML 规则见:[Schema](schema.md#Provider)。 +也可以参考 `model_providers` 目录下其他供应商目录下的 YAML 配置信息,完整的 YAML 规则见:[Schema](schema.md#provider)。 #### 实现供应商代码