Skip to content

Commit

Permalink
docs: refactor models http api into their own page under references (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
wsxiaoys authored Aug 12, 2024
1 parent 11486b8 commit 34ec578
Show file tree
Hide file tree
Showing 10 changed files with 131 additions and 108 deletions.
118 changes: 10 additions & 108 deletions website/docs/administration/model.md
Original file line number Diff line number Diff line change
@@ -1,122 +1,24 @@
# Model Configuration

You can configure how Tabby connect with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates two distinct model types: `Completion` and `Chat`. The `Completion` model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style. On the other hand, the `Chat` model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
You can configure how Tabby connects with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates three types of models: **Completion**, **Chat**, and **Embedding**. Each of them can be configured individually.

With the release of version 0.12, Tabby has rolled out an innovative model configuration system that facilitates linking Tabby to an HTTP API of a model. Furthermore, models listed in the [Model Registry](/docs/models) may be set up as a `local` backend. In this arrangement, Tabby initiates the `llama-server` as a subprocess and seamlessly establishes a connection to the model via the subprocess's HTTP API.
- **Completion Model**: The Completion model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style.
- **Chat Model**: The Chat model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
- **Embedding Model**: The Embedding model is used to generate embeddings for text data, by default Tabby uses the `Nomic-Embed-Text` model.

### Completion Model
Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.

#### [local](/docs/models)

To configure the `local` model, use the following settings:
Below is an example of how to configure the model settings in the `~/.tabby/config.toml` file:

```toml
[model.completion.local]
model_id = "StarCoder2-3B"
```

#### [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints)

The `llama.cpp` model can be configured with the following parameters:

```toml
[model.completion.http]
kind = "llama.cpp/completion"
api_endpoint = "http://localhost:8888"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for CodeLlama model series.
```

#### [ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion)

For setting up the `ollama` model, apply the configuration below:

```toml
[model.completion.http]
kind = "ollama/completion"
model_name = "codellama:7b"
api_endpoint = "http://localhost:8888"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for CodeLlama model series.
```

#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion)

Configure the `mistral/codestral` model as follows:

```toml
[model.completion.http]
kind = "mistral/completion"
api_endpoint = "https://api.mistral.ai"
api_key = "secret-api-key"
```

#### [openai completion](https://platform.openai.com/docs/api-reference/completions)

Configure Tabby with an OpenAI-compatible completion model (`/v1/completions`) using an online service or a self-hosted backend (vLLM, Nvidia NIM, LocalAI, ...) as follows:

```toml
[model.completion.http]
kind = "openai/completion"
model_name = "your_model"
api_endpoint = "https://url_to_your_backend_or_service"
api_key = "secret-api-key"
```

### Chat Model

Chat models adhere to the standard interface specified by OpenAI's `/chat/completions` API.


#### local

For `local` configuration, use:

```toml
[model.chat.local]
model_id = "StarCoder2-3B"
```

#### openai/chat

To configure Tabby's chat functionality with an OpenAI-compatible chat model (`/v1/chat/completions`), apply the settings below. This example uses the API platform of DeepSeek. Similar configurations can be applied for other LLM vendors such as Mistral, OpenAI, etc.
model_id = "Mistral-7B"

```toml
[model.chat.http]
kind = "openai/chat"
model_name = "deepseek-chat"
api_endpoint = "https://api.deepseek.com/v1"
api_key = "secret-api-key"
[model.embedding.local]
model_id = "Nomic-Embed-Text"
```

#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion)

Configure the `mistral/codestral` model as follows:

```toml
[model.completion.http]
kind = "mistral/chat"
api_endpoint = "https://api.mistral.ai"
api_key = "secret-api-key"
```

### Embedding Model

Tabby utilize embedding models to convert documents and queries into vectors for efficient context retrieval. The default embedding model is `Nomic-Embed-Text`, which is a high-performing open embedding model with a large token context window. Currently, `Nomic-Embed-Text` is the only supported local embedding model.

### Using a remote embedding model provider

You can add also a remote embedding model provider by adding a new section to the `~/.tabby/config.toml` file.

```toml
[model.embedding.http]
kind = "openai/embedding"
api_endpoint = "https://api.openai.com"
api_key = "sk-..."
model_name = "text-embedding-3-small"
```

Following embedding model providers are supported:

* `openai/embedding`
* `voyageai/embedding`
* `llama.cpp/embedding`
* `ollama/embedding`
More supported models can be found in the [Model Registry](../../models). For configuring model through HTTP API, check [References / Models HTTP API](../../references/models-http-api/llama.cpp).
4 changes: 4 additions & 0 deletions website/docs/faq.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
sidebar_position: 6
---

import Collapse from '@site/src/components/Collapse';

# ⁉️ Frequently Asked Questions
Expand Down
2 changes: 2 additions & 0 deletions website/docs/references/_category_.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
label: 📚 References
position: 100
1 change: 1 addition & 0 deletions website/docs/references/models-http-api/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
label: Models HTTP API
23 changes: 23 additions & 0 deletions website/docs/references/models-http-api/llama.cpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# llama.cpp

[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models.

Tabby supports the llama.cpp HTTP API for completion, chat, and embedding models.

```toml title="~/.tabby/config.toml"
# Completion model
[model.completion.http]
kind = "llama.cpp/completion"
api_endpoint = "http://localhost:8888"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series.

# Chat model
[model.chat.http]
kind = "openai/chat"
api_endpoint = "http://localhost:8888"

# Embedding model
[model.embedding.http]
kind = "llama.cpp/embedding"
api_endpoint = "http://localhost:8888"
```
19 changes: 19 additions & 0 deletions website/docs/references/models-http-api/mistral-ai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Mistral AI

[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models. Tabby supports Mistral's models for code completion and chat.

To connect Tabby with Mistral's models, you need to apply the following configurations in the `~/.tabby/config.toml` file:

```toml title="~/.tabby/config.toml"
# Completion Model
[model.completion.http]
kind = "mistral/completion"
api_endpoint = "https://api.mistral.ai"
api_key = "secret-api-key"

# Chat Model
[model.completion.http]
kind = "mistral/chat"
api_endpoint = "https://api.mistral.ai"
api_key = "secret-api-key"
```
26 changes: 26 additions & 0 deletions website/docs/references/models-http-api/ollama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Ollama

[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience, powered by llama.cpp.

Tabby supports the ollama HTTP API for completion, chat, and embedding models.

```toml title="~/.tabby/config.toml"
# Completion model
[model.completion.http]
kind = "ollama/completion"
model_name = "codellama:7b"
api_endpoint = "http://localhost:8888"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series.

# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "mistral:7b"
api_endpoint = "http://localhost:8888"

# Embedding model
[model.embedding.http]
kind = "ollama/embedding"
model_name = "nomic-embed-text"
api_endpoint = "http://localhost:8888"
```
30 changes: 30 additions & 0 deletions website/docs/references/models-http-api/openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# OpenAI

OpenAI is a leading AI company that has developed a range of language models. Tabby supports OpenAI's models for chat and embedding tasks.

Tabby also supports its legacy `/v1/completions` API for code completion, although **OpenAI itself no longer supports it**; it is still the API offered by some other vendors, such as (vLLM, Nvidia NIM, LocalAI, ...).

Below is an example configuration:

```toml title="~/.tabby/config.toml"
# Completion model
[model.completion.http]
kind = "openai/completion"
model_name = "your_model"
api_endpoint = "https://url_to_your_backend_or_service"
api_key = "secret-api-key"

# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "gpt-3.5-turbo"
api_endpoint = "https://api.openai.com"
api_key = "secret-api-key"

# Embedding model
[model.embedding.http]
kind = "openai/embedding"
model_name = "text-embedding-3-small"
api_endpoint = "https://api.openai.com"
api_key = "secret-api-key"
```
12 changes: 12 additions & 0 deletions website/docs/references/models-http-api/voyage-ai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Voyage AI

[Voyage AI](https://voyage.ai/) is a company that provides a range of embedding models. Tabby supports Voyage AI's models for embedding tasks.

Below is an example configuration:

```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "voyage/embedding"
api_key = "..."
model_name = "voyage-code-2"
```
4 changes: 4 additions & 0 deletions website/docs/roadmap.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
sidebar_position: 7
---

import Collapse from '@site/src/components/Collapse';

# 🗺️ Roadmap
Expand Down

0 comments on commit 34ec578

Please sign in to comment.