Skip to content

Commit

Permalink
docs(models-http-api): add llamafile example (#3403)
Browse files Browse the repository at this point in the history
* docs(models-http-api): add llamafile example

* docs(models-http-api): add completion support for llamafile
  • Loading branch information
zwpaper authored Nov 28, 2024
1 parent fbe523a commit b486bf5
Showing 1 changed file with 39 additions and 0 deletions.
39 changes: 39 additions & 0 deletions website/docs/references/models-http-api/llamafile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# llamafile

[llamafile](https://github.com/Mozilla-Ocho/llamafile)
is a Mozilla Builders project that allows you to distribute and run LLMs with a single file.

llamafile embeds a llama.cpp server and provides an OpenAI API-compatible chat-completions endpoint,
allowing us to use the `openai/chat`, `llama.cpp/completion`, and `llama.cpp/embedding` types.

By default, llamafile uses port `8080`, which is also used by Tabby.
Therefore, it is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`.

For embeddings, the embedding endpoint is no longer supported in the standard llamafile server,
so you need to run llamafile with the `--embedding` and `--port` options.

Below is an example configuration:

```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "your_model"
api_endpoint = "http://localhost:8081/v1"
api_key = ""

# Completion model
[model.completion.http]
kind = "llama.cpp/completion"
model_name = "your_model"
api_endpoint = "http://localhost:8081"
api_key = "secret-api-key"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>" # Example prompt template for the Qwen2.5 Coder model series.

# Embedding model
[model.embedding.http]
kind = "llama.cpp/embedding"
model_name = "your_model"
api_endpoint = "http://localhost:8082"
api_key = ""
```

0 comments on commit b486bf5

Please sign in to comment.