From 238067039f3dd8d63b709b900d366585daaf128d Mon Sep 17 00:00:00 2001 From: Rafael Vasquez Date: Wed, 18 Dec 2024 13:05:21 -0500 Subject: [PATCH] Fix refs Signed-off-by: Rafael Vasquez --- docs/source/models/pooling_models.md | 10 +++++----- .../serving/openai_compatible_server.md | 20 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/source/models/pooling_models.md b/docs/source/models/pooling_models.md index 41a042da5510f..3e103b99e9b2a 100644 --- a/docs/source/models/pooling_models.md +++ b/docs/source/models/pooling_models.md @@ -10,14 +10,14 @@ before returning them. ```{note} We currently support pooling models primarily as a matter of convenience. -As shown in the {ref}`Compatibility Matrix `, most vLLM features are not applicable to +As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM features are not applicable to pooling models as they only work on the generation or decode stage, so performance may not improve as much. ``` ## Offline Inference The {class}`~vllm.LLM` class provides various methods for offline inference. -See {ref}`Engine Arguments ` for a list of options when initializing the model. +See [Engine Arguments](#engine-args) for a list of options when initializing the model. For pooling models, we support the following {code}`task` options: @@ -106,12 +106,12 @@ A code example can be found in [examples/offline_inference_scoring.py](https://g ## Online Inference -Our [OpenAI Compatible Server](../serving/openai_compatible_server) can be used for online inference. +Our [OpenAI Compatible Server](../serving/openai_compatible_server.md) can be used for online inference. Please click on the above link for more details on how to launch the server. ### Embeddings API -Our Embeddings API is similar to `LLM.embed`, accepting both text and {ref}`multi-modal inputs `. +Our Embeddings API is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs). The text-only API is compatible with [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) so that you can use OpenAI client to interact with it. @@ -119,7 +119,7 @@ A code example can be found in [examples/openai_embedding_client.py](https://git The multi-modal API is an extension of the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) that incorporates [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat), -so it is not part of the OpenAI standard. Please see {ref}`this page ` for more details on how to use it. +so it is not part of the OpenAI standard. Please see [](#multimodal-inputs) for more details on how to use it. ### Score API diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md index 8712d82ca075e..e1c3a6cc6cce9 100644 --- a/docs/source/serving/openai_compatible_server.md +++ b/docs/source/serving/openai_compatible_server.md @@ -30,20 +30,20 @@ print(completion.choices[0].message) We currently support the following OpenAI APIs: - [Completions API](#completions-api) (`/v1/completions`) - - Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`). + - Only applicable to [text generation models](../models/generative_models.md) (`--task generate`). - *Note: `suffix` parameter is not supported.* - [Chat Completions API](#chat-api) (`/v1/chat/completions`) - - Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`) with a [chat template](#chat-template). + - Only applicable to [text generation models](../models/generative_models.md) (`--task generate`) with a [chat template](#chat-template). - *Note: `parallel_tool_calls` and `user` parameters are ignored.* - [Embeddings API](#embeddings-api) (`/v1/embeddings`) - - Only applicable to [embedding models](../models/pooling_models.rst) (`--task embed`). + - Only applicable to [embedding models](../models/pooling_models.md) (`--task embed`). In addition, we have the following custom APIs: - [Tokenizer API](#tokenizer-api) (`/tokenize`, `/detokenize`) - Applicable to any model with a tokenizer. - [Score API](#score-api) (`/score`) - - Only applicable to [cross-encoder models](../models/pooling_models.rst) (`--task score`). + - Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`). (chat-template)= ## Chat Template @@ -183,7 +183,7 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference #### Extra parameters -The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported. +The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported. ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py :language: python @@ -206,12 +206,12 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference We support both [Vision](https://platform.openai.com/docs/guides/vision)- and [Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters; -see our [Multimodal Inputs](../usage/multimodal_inputs.rst) guide for more information. +see our [Multimodal Inputs](../usage/multimodal_inputs.md) guide for more information. - *Note: `image_url.detail` parameter is not supported.* #### Extra parameters -The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported. +The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported. ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py :language: python @@ -236,12 +236,12 @@ If the model has a [chat template](#chat-template), you can replace `inputs` wit which will be treated as a single prompt to the model. ```{tip} -This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details. +This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.md) for details. ``` #### Extra parameters -The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported. +The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported. ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py :language: python @@ -418,7 +418,7 @@ Response: #### Extra parameters -The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported. +The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported. ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py :language: python