From 238067039f3dd8d63b709b900d366585daaf128d Mon Sep 17 00:00:00 2001
From: Rafael Vasquez <rafvasq21@gmail.com>
Date: Wed, 18 Dec 2024 13:05:21 -0500
Subject: [PATCH] Fix refs

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
---
 docs/source/models/pooling_models.md          | 10 +++++-----
 .../serving/openai_compatible_server.md       | 20 +++++++++----------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/docs/source/models/pooling_models.md b/docs/source/models/pooling_models.md
index 41a042da5510f..3e103b99e9b2a 100644
--- a/docs/source/models/pooling_models.md
+++ b/docs/source/models/pooling_models.md
@@ -10,14 +10,14 @@ before returning them.
 
 ```{note}
 We currently support pooling models primarily as a matter of convenience.
-As shown in the {ref}`Compatibility Matrix <compatibility_matrix>`, most vLLM features are not applicable to
+As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM features are not applicable to
 pooling models as they only work on the generation or decode stage, so performance may not improve as much.
 ```
 
 ## Offline Inference
 
 The {class}`~vllm.LLM` class provides various methods for offline inference.
-See {ref}`Engine Arguments <engine_args>` for a list of options when initializing the model.
+See [Engine Arguments](#engine-args) for a list of options when initializing the model.
 
 For pooling models, we support the following {code}`task` options:
 
@@ -106,12 +106,12 @@ A code example can be found in [examples/offline_inference_scoring.py](https://g
 
 ## Online Inference
 
-Our [OpenAI Compatible Server](../serving/openai_compatible_server) can be used for online inference.
+Our [OpenAI Compatible Server](../serving/openai_compatible_server.md) can be used for online inference.
 Please click on the above link for more details on how to launch the server.
 
 ### Embeddings API
 
-Our Embeddings API is similar to `LLM.embed`, accepting both text and {ref}`multi-modal inputs <multimodal_inputs>`.
+Our Embeddings API is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs).
 
 The text-only API is compatible with [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
 so that you can use OpenAI client to interact with it.
@@ -119,7 +119,7 @@ A code example can be found in [examples/openai_embedding_client.py](https://git
 
 The multi-modal API is an extension of the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
 that incorporates [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat),
-so it is not part of the OpenAI standard. Please see {ref}`this page <multimodal_inputs>` for more details on how to use it.
+so it is not part of the OpenAI standard. Please see [](#multimodal-inputs) for more details on how to use it.
 
 ### Score API
 
diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md
index 8712d82ca075e..e1c3a6cc6cce9 100644
--- a/docs/source/serving/openai_compatible_server.md
+++ b/docs/source/serving/openai_compatible_server.md
@@ -30,20 +30,20 @@ print(completion.choices[0].message)
 We currently support the following OpenAI APIs:
 
 - [Completions API](#completions-api) (`/v1/completions`)
-  - Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`).
+  - Only applicable to [text generation models](../models/generative_models.md) (`--task generate`).
   - *Note: `suffix` parameter is not supported.*
 - [Chat Completions API](#chat-api) (`/v1/chat/completions`)
-  - Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`) with a [chat template](#chat-template).
+  - Only applicable to [text generation models](../models/generative_models.md) (`--task generate`) with a [chat template](#chat-template).
   - *Note: `parallel_tool_calls` and `user` parameters are ignored.*
 - [Embeddings API](#embeddings-api) (`/v1/embeddings`)
-  - Only applicable to [embedding models](../models/pooling_models.rst) (`--task embed`).
+  - Only applicable to [embedding models](../models/pooling_models.md) (`--task embed`).
 
 In addition, we have the following custom APIs:
 
 - [Tokenizer API](#tokenizer-api) (`/tokenize`, `/detokenize`)
   - Applicable to any model with a tokenizer.
 - [Score API](#score-api) (`/score`)
-  - Only applicable to [cross-encoder models](../models/pooling_models.rst) (`--task score`).
+  - Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`).
 
 (chat-template)=
 ## Chat Template
@@ -183,7 +183,7 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference
 
 #### Extra parameters
 
-The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
+The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.
 
 ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
 :language: python
@@ -206,12 +206,12 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference
 
 We support both [Vision](https://platform.openai.com/docs/guides/vision)- and
 [Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters;
-see our [Multimodal Inputs](../usage/multimodal_inputs.rst) guide for more information.
+see our [Multimodal Inputs](../usage/multimodal_inputs.md) guide for more information.
 - *Note: `image_url.detail` parameter is not supported.*
 
 #### Extra parameters
 
-The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
+The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.
 
 ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
 :language: python
@@ -236,12 +236,12 @@ If the model has a [chat template](#chat-template), you can replace `inputs` wit
 which will be treated as a single prompt to the model.
 
 ```{tip}
-This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
+This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.md) for details.
 ```
 
 #### Extra parameters
 
-The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
+The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.
 
 ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
 :language: python
@@ -418,7 +418,7 @@ Response:
 
 #### Extra parameters
 
-The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
+The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.
 
 ```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
 :language: python