-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: refactor models http api into their own page under references (#…
- Loading branch information
Showing
10 changed files
with
131 additions
and
108 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,122 +1,24 @@ | ||
# Model Configuration | ||
|
||
You can configure how Tabby connect with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates two distinct model types: `Completion` and `Chat`. The `Completion` model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style. On the other hand, the `Chat` model is adept at producing conversational replies and is broadly compatible with OpenAI's standards. | ||
You can configure how Tabby connects with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates three types of models: **Completion**, **Chat**, and **Embedding**. Each of them can be configured individually. | ||
|
||
With the release of version 0.12, Tabby has rolled out an innovative model configuration system that facilitates linking Tabby to an HTTP API of a model. Furthermore, models listed in the [Model Registry](/docs/models) may be set up as a `local` backend. In this arrangement, Tabby initiates the `llama-server` as a subprocess and seamlessly establishes a connection to the model via the subprocess's HTTP API. | ||
- **Completion Model**: The Completion model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style. | ||
- **Chat Model**: The Chat model is adept at producing conversational replies and is broadly compatible with OpenAI's standards. | ||
- **Embedding Model**: The Embedding model is used to generate embeddings for text data, by default Tabby uses the `Nomic-Embed-Text` model. | ||
|
||
### Completion Model | ||
Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API. | ||
|
||
#### [local](/docs/models) | ||
|
||
To configure the `local` model, use the following settings: | ||
Below is an example of how to configure the model settings in the `~/.tabby/config.toml` file: | ||
|
||
```toml | ||
[model.completion.local] | ||
model_id = "StarCoder2-3B" | ||
``` | ||
|
||
#### [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) | ||
|
||
The `llama.cpp` model can be configured with the following parameters: | ||
|
||
```toml | ||
[model.completion.http] | ||
kind = "llama.cpp/completion" | ||
api_endpoint = "http://localhost:8888" | ||
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for CodeLlama model series. | ||
``` | ||
|
||
#### [ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) | ||
|
||
For setting up the `ollama` model, apply the configuration below: | ||
|
||
```toml | ||
[model.completion.http] | ||
kind = "ollama/completion" | ||
model_name = "codellama:7b" | ||
api_endpoint = "http://localhost:8888" | ||
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for CodeLlama model series. | ||
``` | ||
|
||
#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion) | ||
|
||
Configure the `mistral/codestral` model as follows: | ||
|
||
```toml | ||
[model.completion.http] | ||
kind = "mistral/completion" | ||
api_endpoint = "https://api.mistral.ai" | ||
api_key = "secret-api-key" | ||
``` | ||
|
||
#### [openai completion](https://platform.openai.com/docs/api-reference/completions) | ||
|
||
Configure Tabby with an OpenAI-compatible completion model (`/v1/completions`) using an online service or a self-hosted backend (vLLM, Nvidia NIM, LocalAI, ...) as follows: | ||
|
||
```toml | ||
[model.completion.http] | ||
kind = "openai/completion" | ||
model_name = "your_model" | ||
api_endpoint = "https://url_to_your_backend_or_service" | ||
api_key = "secret-api-key" | ||
``` | ||
|
||
### Chat Model | ||
|
||
Chat models adhere to the standard interface specified by OpenAI's `/chat/completions` API. | ||
|
||
|
||
#### local | ||
|
||
For `local` configuration, use: | ||
|
||
```toml | ||
[model.chat.local] | ||
model_id = "StarCoder2-3B" | ||
``` | ||
|
||
#### openai/chat | ||
|
||
To configure Tabby's chat functionality with an OpenAI-compatible chat model (`/v1/chat/completions`), apply the settings below. This example uses the API platform of DeepSeek. Similar configurations can be applied for other LLM vendors such as Mistral, OpenAI, etc. | ||
model_id = "Mistral-7B" | ||
|
||
```toml | ||
[model.chat.http] | ||
kind = "openai/chat" | ||
model_name = "deepseek-chat" | ||
api_endpoint = "https://api.deepseek.com/v1" | ||
api_key = "secret-api-key" | ||
[model.embedding.local] | ||
model_id = "Nomic-Embed-Text" | ||
``` | ||
|
||
#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion) | ||
|
||
Configure the `mistral/codestral` model as follows: | ||
|
||
```toml | ||
[model.completion.http] | ||
kind = "mistral/chat" | ||
api_endpoint = "https://api.mistral.ai" | ||
api_key = "secret-api-key" | ||
``` | ||
|
||
### Embedding Model | ||
|
||
Tabby utilize embedding models to convert documents and queries into vectors for efficient context retrieval. The default embedding model is `Nomic-Embed-Text`, which is a high-performing open embedding model with a large token context window. Currently, `Nomic-Embed-Text` is the only supported local embedding model. | ||
|
||
### Using a remote embedding model provider | ||
|
||
You can add also a remote embedding model provider by adding a new section to the `~/.tabby/config.toml` file. | ||
|
||
```toml | ||
[model.embedding.http] | ||
kind = "openai/embedding" | ||
api_endpoint = "https://api.openai.com" | ||
api_key = "sk-..." | ||
model_name = "text-embedding-3-small" | ||
``` | ||
|
||
Following embedding model providers are supported: | ||
|
||
* `openai/embedding` | ||
* `voyageai/embedding` | ||
* `llama.cpp/embedding` | ||
* `ollama/embedding` | ||
More supported models can be found in the [Model Registry](../../models). For configuring model through HTTP API, check [References / Models HTTP API](../../references/models-http-api/llama.cpp). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
label: 📚 References | ||
position: 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
label: Models HTTP API |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# llama.cpp | ||
|
||
[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. | ||
|
||
Tabby supports the llama.cpp HTTP API for completion, chat, and embedding models. | ||
|
||
```toml title="~/.tabby/config.toml" | ||
# Completion model | ||
[model.completion.http] | ||
kind = "llama.cpp/completion" | ||
api_endpoint = "http://localhost:8888" | ||
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series. | ||
|
||
# Chat model | ||
[model.chat.http] | ||
kind = "openai/chat" | ||
api_endpoint = "http://localhost:8888" | ||
|
||
# Embedding model | ||
[model.embedding.http] | ||
kind = "llama.cpp/embedding" | ||
api_endpoint = "http://localhost:8888" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Mistral AI | ||
|
||
[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models. Tabby supports Mistral's models for code completion and chat. | ||
|
||
To connect Tabby with Mistral's models, you need to apply the following configurations in the `~/.tabby/config.toml` file: | ||
|
||
```toml title="~/.tabby/config.toml" | ||
# Completion Model | ||
[model.completion.http] | ||
kind = "mistral/completion" | ||
api_endpoint = "https://api.mistral.ai" | ||
api_key = "secret-api-key" | ||
|
||
# Chat Model | ||
[model.completion.http] | ||
kind = "mistral/chat" | ||
api_endpoint = "https://api.mistral.ai" | ||
api_key = "secret-api-key" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Ollama | ||
|
||
[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience, powered by llama.cpp. | ||
|
||
Tabby supports the ollama HTTP API for completion, chat, and embedding models. | ||
|
||
```toml title="~/.tabby/config.toml" | ||
# Completion model | ||
[model.completion.http] | ||
kind = "ollama/completion" | ||
model_name = "codellama:7b" | ||
api_endpoint = "http://localhost:8888" | ||
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series. | ||
|
||
# Chat model | ||
[model.chat.http] | ||
kind = "openai/chat" | ||
model_name = "mistral:7b" | ||
api_endpoint = "http://localhost:8888" | ||
|
||
# Embedding model | ||
[model.embedding.http] | ||
kind = "ollama/embedding" | ||
model_name = "nomic-embed-text" | ||
api_endpoint = "http://localhost:8888" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# OpenAI | ||
|
||
OpenAI is a leading AI company that has developed a range of language models. Tabby supports OpenAI's models for chat and embedding tasks. | ||
|
||
Tabby also supports its legacy `/v1/completions` API for code completion, although **OpenAI itself no longer supports it**; it is still the API offered by some other vendors, such as (vLLM, Nvidia NIM, LocalAI, ...). | ||
|
||
Below is an example configuration: | ||
|
||
```toml title="~/.tabby/config.toml" | ||
# Completion model | ||
[model.completion.http] | ||
kind = "openai/completion" | ||
model_name = "your_model" | ||
api_endpoint = "https://url_to_your_backend_or_service" | ||
api_key = "secret-api-key" | ||
|
||
# Chat model | ||
[model.chat.http] | ||
kind = "openai/chat" | ||
model_name = "gpt-3.5-turbo" | ||
api_endpoint = "https://api.openai.com" | ||
api_key = "secret-api-key" | ||
|
||
# Embedding model | ||
[model.embedding.http] | ||
kind = "openai/embedding" | ||
model_name = "text-embedding-3-small" | ||
api_endpoint = "https://api.openai.com" | ||
api_key = "secret-api-key" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Voyage AI | ||
|
||
[Voyage AI](https://voyage.ai/) is a company that provides a range of embedding models. Tabby supports Voyage AI's models for embedding tasks. | ||
|
||
Below is an example configuration: | ||
|
||
```toml title="~/.tabby/config.toml" | ||
[model.embedding.http] | ||
kind = "voyage/embedding" | ||
api_key = "..." | ||
model_name = "voyage-code-2" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
--- | ||
sidebar_position: 7 | ||
--- | ||
|
||
import Collapse from '@site/src/components/Collapse'; | ||
|
||
# 🗺️ Roadmap | ||
|