Skip to content

Commit

Permalink
Fix refs
Browse files Browse the repository at this point in the history
Signed-off-by: Rafael Vasquez <[email protected]>
  • Loading branch information
rafvasq committed Dec 18, 2024
1 parent 44c5a5b commit 2380670
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 15 deletions.
10 changes: 5 additions & 5 deletions docs/source/models/pooling_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ before returning them.

```{note}
We currently support pooling models primarily as a matter of convenience.
As shown in the {ref}`Compatibility Matrix <compatibility_matrix>`, most vLLM features are not applicable to
As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM features are not applicable to
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
```

## Offline Inference

The {class}`~vllm.LLM` class provides various methods for offline inference.
See {ref}`Engine Arguments <engine_args>` for a list of options when initializing the model.
See [Engine Arguments](#engine-args) for a list of options when initializing the model.

For pooling models, we support the following {code}`task` options:

Expand Down Expand Up @@ -106,20 +106,20 @@ A code example can be found in [examples/offline_inference_scoring.py](https://g

## Online Inference

Our [OpenAI Compatible Server](../serving/openai_compatible_server) can be used for online inference.
Our [OpenAI Compatible Server](../serving/openai_compatible_server.md) can be used for online inference.
Please click on the above link for more details on how to launch the server.

### Embeddings API

Our Embeddings API is similar to `LLM.embed`, accepting both text and {ref}`multi-modal inputs <multimodal_inputs>`.
Our Embeddings API is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs).

The text-only API is compatible with [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
so that you can use OpenAI client to interact with it.
A code example can be found in [examples/openai_embedding_client.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py).

The multi-modal API is an extension of the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
that incorporates [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat),
so it is not part of the OpenAI standard. Please see {ref}`this page <multimodal_inputs>` for more details on how to use it.
so it is not part of the OpenAI standard. Please see [](#multimodal-inputs) for more details on how to use it.

### Score API

Expand Down
20 changes: 10 additions & 10 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,20 @@ print(completion.choices[0].message)
We currently support the following OpenAI APIs:

- [Completions API](#completions-api) (`/v1/completions`)
- Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`).
- Only applicable to [text generation models](../models/generative_models.md) (`--task generate`).
- *Note: `suffix` parameter is not supported.*
- [Chat Completions API](#chat-api) (`/v1/chat/completions`)
- Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`) with a [chat template](#chat-template).
- Only applicable to [text generation models](../models/generative_models.md) (`--task generate`) with a [chat template](#chat-template).
- *Note: `parallel_tool_calls` and `user` parameters are ignored.*
- [Embeddings API](#embeddings-api) (`/v1/embeddings`)
- Only applicable to [embedding models](../models/pooling_models.rst) (`--task embed`).
- Only applicable to [embedding models](../models/pooling_models.md) (`--task embed`).

In addition, we have the following custom APIs:

- [Tokenizer API](#tokenizer-api) (`/tokenize`, `/detokenize`)
- Applicable to any model with a tokenizer.
- [Score API](#score-api) (`/score`)
- Only applicable to [cross-encoder models](../models/pooling_models.rst) (`--task score`).
- Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`).

(chat-template)=
## Chat Template
Expand Down Expand Up @@ -183,7 +183,7 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference

#### Extra parameters

The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.

```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python
Expand All @@ -206,12 +206,12 @@ Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference

We support both [Vision](https://platform.openai.com/docs/guides/vision)- and
[Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters;
see our [Multimodal Inputs](../usage/multimodal_inputs.rst) guide for more information.
see our [Multimodal Inputs](../usage/multimodal_inputs.md) guide for more information.
- *Note: `image_url.detail` parameter is not supported.*

#### Extra parameters

The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.md) are supported.

```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python
Expand All @@ -236,12 +236,12 @@ If the model has a [chat template](#chat-template), you can replace `inputs` wit
which will be treated as a single prompt to the model.

```{tip}
This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.md) for details.
```

#### Extra parameters

The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.

```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python
Expand Down Expand Up @@ -418,7 +418,7 @@ Response:

#### Extra parameters

The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.md) are supported.

```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
:language: python
Expand Down

0 comments on commit 2380670

Please sign in to comment.