docs: refactor models http api into their own page under references (#…

…2850)
TabbyML · Aug 12, 2024 · 34ec578 · 34ec578
1 parent 11486b8
commit 34ec578
Show file tree

Hide file tree

Showing 10 changed files with 131 additions and 108 deletions.
diff --git a/website/docs/administration/model.md b/website/docs/administration/model.md
@@ -1,122 +1,24 @@
 # Model Configuration
 
-You can configure how Tabby connect with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates two distinct model types: `Completion` and `Chat`. The `Completion` model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style. On the other hand, the `Chat` model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
+You can configure how Tabby connects with LLM models by editing the `~/.tabby/config.toml` file. Tabby incorporates three types of models: **Completion**, **Chat**, and **Embedding**. Each of them can be configured individually.
 
-With the release of version 0.12, Tabby has rolled out an innovative model configuration system that facilitates linking Tabby to an HTTP API of a model. Furthermore, models listed in the [Model Registry](/docs/models) may be set up as a `local` backend. In this arrangement, Tabby initiates the `llama-server` as a subprocess and seamlessly establishes a connection to the model via the subprocess's HTTP API.
+- **Completion Model**: The Completion model is designed to provide suggestions for code completion, focusing mainly on the Fill-in-the-Middle (FIM) prompting style.
+- **Chat Model**: The Chat model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
+- **Embedding Model**: The Embedding model is used to generate embeddings for text data, by default Tabby uses the `Nomic-Embed-Text` model.
 
-### Completion Model
+Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.
 
-#### [local](/docs/models)
-
-To configure the `local` model, use the following settings:
+Below is an example of how to configure the model settings in the `~/.tabby/config.toml` file:
 
 ```toml
 [model.completion.local]
 model_id = "StarCoder2-3B"
-```
-
-#### [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints)
-
-The `llama.cpp` model can be configured with the following parameters:
-
-```toml
-[model.completion.http]
-kind = "llama.cpp/completion"
-api_endpoint = "http://localhost:8888"
-prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for CodeLlama model series.
-```
-
-#### [ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion)
-
-For setting up the `ollama` model, apply the configuration below:
-
-```toml
-[model.completion.http]
-kind = "ollama/completion"
-model_name = "codellama:7b"
-api_endpoint = "http://localhost:8888"
-prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for CodeLlama model series.
-```
-
-#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion)
-
-Configure the `mistral/codestral` model as follows:
-
-```toml
-[model.completion.http]
-kind = "mistral/completion"
-api_endpoint = "https://api.mistral.ai"
-api_key = "secret-api-key"
-```
-
-#### [openai completion](https://platform.openai.com/docs/api-reference/completions)
-
-Configure Tabby with an OpenAI-compatible completion model (`/v1/completions`) using an online service or a self-hosted backend (vLLM, Nvidia NIM, LocalAI, ...) as follows:
-
-```toml
-[model.completion.http]
-kind = "openai/completion"
-model_name = "your_model"
-api_endpoint = "https://url_to_your_backend_or_service"
-api_key = "secret-api-key"
-```
-
-### Chat Model
 
-Chat models adhere to the standard interface specified by OpenAI's `/chat/completions` API.
-
-
-#### local
-
-For `local` configuration, use:
-
-```toml
 [model.chat.local]
-model_id = "StarCoder2-3B"
-```
-
-#### openai/chat
-
-To configure Tabby's chat functionality with an OpenAI-compatible chat model (`/v1/chat/completions`), apply the settings below. This example uses the API platform of DeepSeek. Similar configurations can be applied for other LLM vendors such as Mistral, OpenAI, etc.
+model_id = "Mistral-7B"
 
-```toml
-[model.chat.http]
-kind = "openai/chat"
-model_name = "deepseek-chat"
-api_endpoint = "https://api.deepseek.com/v1"
-api_key = "secret-api-key"
+[model.embedding.local]
+model_id = "Nomic-Embed-Text"
 ```
 
-#### [mistral / codestral](https://docs.mistral.ai/api/#operation/createFIMCompletion)
-
-Configure the `mistral/codestral` model as follows:
-
-```toml
-[model.completion.http]
-kind = "mistral/chat"
-api_endpoint = "https://api.mistral.ai"
-api_key = "secret-api-key"
-```
-
-### Embedding Model
-
-Tabby utilize embedding models to convert documents and queries into vectors for efficient context retrieval. The default embedding model is `Nomic-Embed-Text`, which is a high-performing open embedding model with a large token context window. Currently, `Nomic-Embed-Text` is the only supported local embedding model.
-
-### Using a remote embedding model provider
-
-You can add also a remote embedding model provider by adding a new section to the `~/.tabby/config.toml` file.
-
-```toml
-[model.embedding.http]
-kind = "openai/embedding"
-api_endpoint = "https://api.openai.com"
-api_key = "sk-..."
-model_name = "text-embedding-3-small"
-```
-
-Following embedding model providers are supported:
-
-* `openai/embedding`
-* `voyageai/embedding`
-* `llama.cpp/embedding`
-* `ollama/embedding`
+More supported models can be found in the [Model Registry](../../models). For configuring model through HTTP API, check [References / Models HTTP API](../../references/models-http-api/llama.cpp).
diff --git a/website/docs/faq.mdx b/website/docs/faq.mdx
@@ -1,3 +1,7 @@
+---
+sidebar_position: 6
+---
+
 import Collapse from '@site/src/components/Collapse';
 
 # ⁉️ Frequently Asked Questions

diff --git a/website/docs/references/_category_.yaml b/website/docs/references/_category_.yaml
@@ -0,0 +1,2 @@
+label: 📚 References
+position: 100
diff --git a/website/docs/references/models-http-api/_category_.yml b/website/docs/references/models-http-api/_category_.yml
@@ -0,0 +1 @@
+label: Models HTTP API
diff --git a/website/docs/references/models-http-api/llama.cpp.md b/website/docs/references/models-http-api/llama.cpp.md
@@ -0,0 +1,23 @@
+# llama.cpp
+
+[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models.
+
+Tabby supports the llama.cpp HTTP API for completion, chat, and embedding models.
+
+```toml title="~/.tabby/config.toml"
+# Completion model
+[model.completion.http]
+kind = "llama.cpp/completion"
+api_endpoint = "http://localhost:8888"
+prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+api_endpoint = "http://localhost:8888"
+
+# Embedding model
+[model.embedding.http]
+kind = "llama.cpp/embedding"
+api_endpoint = "http://localhost:8888"
+```
diff --git a/website/docs/references/models-http-api/mistral-ai.md b/website/docs/references/models-http-api/mistral-ai.md
@@ -0,0 +1,19 @@
+# Mistral AI
+
+[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models. Tabby supports Mistral's models for code completion and chat.
+
+To connect Tabby with Mistral's models, you need to apply the following configurations in the `~/.tabby/config.toml` file:
+
+```toml title="~/.tabby/config.toml"
+# Completion Model
+[model.completion.http]
+kind = "mistral/completion"
+api_endpoint = "https://api.mistral.ai"
+api_key = "secret-api-key"
+
+# Chat Model
+[model.completion.http]
+kind = "mistral/chat"
+api_endpoint = "https://api.mistral.ai"
+api_key = "secret-api-key"
+```
diff --git a/website/docs/references/models-http-api/ollama.md b/website/docs/references/models-http-api/ollama.md
@@ -0,0 +1,26 @@
+# Ollama
+
+[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience, powered by llama.cpp.
+
+Tabby supports the ollama HTTP API for completion, chat, and embedding models.
+
+```toml title="~/.tabby/config.toml"
+# Completion model
+[model.completion.http]
+kind = "ollama/completion"
+model_name = "codellama:7b"
+api_endpoint = "http://localhost:8888"
+prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+model_name = "mistral:7b"
+api_endpoint = "http://localhost:8888"
+
+# Embedding model
+[model.embedding.http]
+kind = "ollama/embedding"
+model_name = "nomic-embed-text"
+api_endpoint = "http://localhost:8888"
+```
diff --git a/website/docs/references/models-http-api/openai.md b/website/docs/references/models-http-api/openai.md
@@ -0,0 +1,30 @@
+# OpenAI
+
+OpenAI is a leading AI company that has developed a range of language models. Tabby supports OpenAI's models for chat and embedding tasks.
+
+Tabby also supports its legacy `/v1/completions` API for code completion, although **OpenAI itself no longer supports it**; it is still the API offered by some other vendors, such as (vLLM, Nvidia NIM, LocalAI, ...).
+
+Below is an example configuration:
+
+```toml title="~/.tabby/config.toml"
+# Completion model
+[model.completion.http]
+kind = "openai/completion"
+model_name = "your_model"
+api_endpoint = "https://url_to_your_backend_or_service"
+api_key = "secret-api-key"
+
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+model_name = "gpt-3.5-turbo"
+api_endpoint = "https://api.openai.com"
+api_key = "secret-api-key"
+
+# Embedding model
+[model.embedding.http]
+kind = "openai/embedding"
+model_name = "text-embedding-3-small"
+api_endpoint = "https://api.openai.com"
+api_key = "secret-api-key"
+```
diff --git a/website/docs/references/models-http-api/voyage-ai.md b/website/docs/references/models-http-api/voyage-ai.md
@@ -0,0 +1,12 @@
+# Voyage AI
+
+[Voyage AI](https://voyage.ai/) is a company that provides a range of embedding models. Tabby supports Voyage AI's models for embedding tasks.
+
+Below is an example configuration:
+
+```toml title="~/.tabby/config.toml"
+[model.embedding.http]
+kind = "voyage/embedding"
+api_key = "..."
+model_name = "voyage-code-2"
+```
diff --git a/website/docs/roadmap.mdx b/website/docs/roadmap.mdx
@@ -1,3 +1,7 @@
+---
+sidebar_position: 7
+---
+
 import Collapse from '@site/src/components/Collapse';
 
 # 🗺️ Roadmap