diff --git a/CLI.md b/CLI.md
new file mode 100644
index 00000000..5bc1a484
--- /dev/null
+++ b/CLI.md
@@ -0,0 +1,260 @@
+## CLI Reference
+
+### Overview
+```
+usage: nexa [-h] [-V] {run,onnx,server,pull,remove,clean,list,login,whoami,logout} ...
+
+Nexa CLI tool for handling various model operations.
+
+positional arguments:
+ {run,onnx,server,pull,remove,clean,list,login,whoami,logout}
+ sub-command help
+ run Run inference for various tasks using GGUF models.
+ onnx Run inference for various tasks using ONNX models.
+ server Run the Nexa AI Text Generation Service
+ pull Pull a model from official or hub.
+ remove Remove a model from local machine.
+ clean Clean up all model files.
+ list List all models in the local machine.
+ login Login to Nexa API.
+ whoami Show current user information.
+ logout Logout from Nexa API.
+
+options:
+ -h, --help show this help message and exit
+ -V, --version Show the version of the Nexa SDK.
+```
+
+### List Local Models
+
+List all models on your local computer.
+
+```
+nexa list
+```
+
+### Download a Model
+
+Download a model file to your local computer from Nexa Model Hub.
+
+```
+nexa pull MODEL_PATH
+usage: nexa pull [-h] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+```
+
+#### Example
+
+```
+nexa pull llama2
+```
+
+### Remove a Model
+
+Remove a model from your local computer.
+
+```
+nexa remove MODEL_PATH
+usage: nexa remove [-h] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+```
+
+#### Example
+
+```
+nexa remove llama2
+```
+
+### Remove All Downloaded Models
+
+Remove all downloaded models on your local computer.
+
+```
+nexa clean
+```
+
+### Run a Model
+
+Run a model on your local computer. If the model file is not yet downloaded, it will be automatically fetched first.
+
+By default, `nexa` will run gguf models. To run onnx models, use `nexa onnx MODEL_PATH`
+
+#### Run Text-Generation Model
+
+```
+nexa run MODEL_PATH
+usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+ -pf, --profiling Enable profiling logs for the inference process
+ -st, --streamlit Run the inference in Streamlit UI
+
+Text generation options:
+ -t, --temperature TEMPERATURE
+ Temperature for sampling
+ -m, --max_new_tokens MAX_NEW_TOKENS
+ Maximum number of new tokens to generate
+ -k, --top_k TOP_K Top-k sampling parameter
+ -p, --top_p TOP_P Top-p sampling parameter
+ -sw, --stop_words [STOP_WORDS ...]
+ List of stop words for early stopping
+```
+
+##### Example
+
+```
+nexa run llama2
+```
+
+#### Run Image-Generation Model
+
+```
+nexa run MODEL_PATH
+usage: nexa run [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [-st] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+ -st, --streamlit Run the inference in Streamlit UI
+
+Image generation options:
+ -i2i, --img2img Whether to run image-to-image generation
+ -ns, --num_inference_steps NUM_INFERENCE_STEPS
+ Number of inference steps
+ -np, --num_images_per_prompt NUM_IMAGES_PER_PROMPT
+ Number of images to generate per prompt
+ -H, --height HEIGHT Height of the output image
+ -W, --width WIDTH Width of the output image
+ -g, --guidance_scale GUIDANCE_SCALE
+ Guidance scale for diffusion
+ -o, --output OUTPUT Output path for the generated image
+ -s, --random_seed RANDOM_SEED
+ Random seed for image generation
+ --lora_dir LORA_DIR Path to directory containing LoRA files
+ --wtype WTYPE Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
+ --control_net_path CONTROL_NET_PATH
+ Path to control net model
+ --control_image_path CONTROL_IMAGE_PATH
+ Path to image condition for Control Net
+ --control_strength CONTROL_STRENGTH
+ Strength to apply Control Net
+```
+
+##### Example
+
+```
+nexa run sd1-4
+```
+
+#### Run Vision-Language Model
+
+```
+nexa run MODEL_PATH
+usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+ -pf, --profiling Enable profiling logs for the inference process
+ -st, --streamlit Run the inference in Streamlit UI
+
+VLM generation options:
+ -t, --temperature TEMPERATURE
+ Temperature for sampling
+ -m, --max_new_tokens MAX_NEW_TOKENS
+ Maximum number of new tokens to generate
+ -k, --top_k TOP_K Top-k sampling parameter
+ -p, --top_p TOP_P Top-p sampling parameter
+ -sw, --stop_words [STOP_WORDS ...]
+ List of stop words for early stopping
+```
+
+##### Example
+
+```
+nexa run nanollava
+```
+
+#### Run Audio Model
+
+```
+nexa run MODEL_PATH
+usage: nexa run [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-t TEMPERATURE] [-c COMPUTE_TYPE] [-st] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in Nexa Model Hub
+
+options:
+ -h, --help show this help message and exit
+ -st, --streamlit Run the inference in Streamlit UI
+
+Automatic Speech Recognition options:
+ -b, --beam_size BEAM_SIZE
+ Beam size to use for transcription
+ -l, --language LANGUAGE
+ The language spoken in the audio. It should be a language code such as 'en' or 'fr'.
+ --task TASK Task to execute (transcribe or translate)
+ -c, --compute_type COMPUTE_TYPE
+ Type to use for computation (e.g., float16, int8, int8_float16)
+```
+
+##### Example
+
+```
+nexa run faster-whisper-tiny
+```
+
+### Start Local Server
+
+Start a local server using models on your local computer.
+
+```
+nexa server MODEL_PATH
+usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path
+
+positional arguments:
+ model_path Path or identifier for the model in S3
+
+options:
+ -h, --help show this help message and exit
+ --host HOST Host to bind the server to
+ --port PORT Port to bind the server to
+ --reload Enable automatic reloading on code changes
+```
+
+#### Example
+
+```
+nexa server llama2
+```
+
+### Model Path Format
+
+For `model_path` in nexa commands, it's better to follow the standard format to ensure correct model loading and execution. The standard format for `model_path` is:
+
+- `[user_name]/[repo_name]:[tag_name]` (user's model)
+- `[repo_name]:[tag_name]` (official model)
+
+#### Examples:
+
+- `gemma-2b:q4_0`
+- `Meta-Llama-3-8B-Instruct:onnx-cpu-int8`
+- `alanzhuly/Qwen2-1B-Instruct:q4_0`
\ No newline at end of file
diff --git a/README.md b/README.md
index ebaec430..1ca86b7d 100644
--- a/README.md
+++ b/README.md
@@ -99,21 +99,6 @@ If pre-built wheels cannot meet your requirements, you can install Nexa SDK from
pip install nexaai
```
-
-FAQ: Building Issues for llava
-
-If you encounter the following issue while building:
-
-![](docs/.media/error.jpeg)
-
-try the following command:
-
-```bash
-CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai
-```
-
-
-
#### GPU (Metal)
For the GPU version supporting Metal (macOS):
@@ -146,6 +131,23 @@ CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai
> [!TIP]
> For Windows users, we recommend running the installation command in Git Bash to avoid unexpected behavior.
+
+
+FAQ: Building Issues for llava
+
+If you encounter the following issue while building:
+
+![](docs/.media/error.jpeg)
+
+try the following command:
+
+```bash
+CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai
+```
+
+
+
+
### Docker Usage
Note: Docker doesn't support GPU acceleration
@@ -211,583 +213,27 @@ will create an interactive session with text generation
## CLI Reference
-```
-usage: nexa [-h] [-V] {run,onnx,server,pull,remove,clean,list,login,whoami,logout} ...
-
-Nexa CLI tool for handling various model operations.
-
-positional arguments:
- {run,onnx,server,pull,remove,clean,list,login,whoami,logout}
- sub-command help
- run Run inference for various tasks using GGUF models.
- onnx Run inference for various tasks using ONNX models.
- server Run the Nexa AI Text Generation Service
- pull Pull a model from official or hub.
- remove Remove a model from local machine.
- clean Clean up all model files.
- list List all models in the local machine.
- login Login to Nexa API.
- whoami Show current user information.
- logout Logout from Nexa API.
-
-options:
- -h, --help show this help message and exit
- -V, --version Show the version of the Nexa SDK.
-```
-
-### List Local Models
-
-List all models on your local computer.
-
-```
-nexa list
-```
-
-### Download a Model
-
-Download a model file to your local computer from Nexa Model Hub.
-
-```
-nexa pull MODEL_PATH
-usage: nexa pull [-h] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
-```
-
-#### Example
-
-```
-nexa pull llama2
-```
-
-### Remove a Model
-
-Remove a model from your local computer.
-
-```
-nexa remove MODEL_PATH
-usage: nexa remove [-h] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
-```
-
-#### Example
-
-```
-nexa remove llama2
-```
-
-### Remove All Downloaded Models
-
-Remove all downloaded models on your local computer.
-
-```
-nexa clean
-```
-
-### Run a Model
-
-Run a model on your local computer. If the model file is not yet downloaded, it will be automatically fetched first.
+Here's a brief overview of the main CLI commands:
-By default, `nexa` will run gguf models. To run onnx models, use `nexa onnx MODEL_PATH`
+- `nexa run`: Run inference for various tasks using GGUF models.
+- `nexa onnx`: Run inference for various tasks using ONNX models.
+- `nexa server`: Run the Nexa AI Text Generation Service.
+- `nexa pull`: Pull a model from official or hub.
+- `nexa remove`: Remove a model from local machine.
+- `nexa clean`: Clean up all model files.
+- `nexa list`: List all models in the local machine.
+- `nexa login`: Login to Nexa API.
+- `nexa whoami`: Show current user information.
+- `nexa logout`: Logout from Nexa API.
-#### Run Text-Generation Model
+For detailed information on CLI commands and usage, please refer to the [CLI Reference](CLI.md) document.
-```
-nexa run MODEL_PATH
-usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
- -pf, --profiling Enable profiling logs for the inference process
- -st, --streamlit Run the inference in Streamlit UI
-
-Text generation options:
- -t, --temperature TEMPERATURE
- Temperature for sampling
- -m, --max_new_tokens MAX_NEW_TOKENS
- Maximum number of new tokens to generate
- -k, --top_k TOP_K Top-k sampling parameter
- -p, --top_p TOP_P Top-p sampling parameter
- -sw, --stop_words [STOP_WORDS ...]
- List of stop words for early stopping
-```
-
-##### Example
-
-```
-nexa run llama2
-```
-
-#### Run Image-Generation Model
-
-```
-nexa run MODEL_PATH
-usage: nexa run [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [-st] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
- -st, --streamlit Run the inference in Streamlit UI
-
-Image generation options:
- -i2i, --img2img Whether to run image-to-image generation
- -ns, --num_inference_steps NUM_INFERENCE_STEPS
- Number of inference steps
- -np, --num_images_per_prompt NUM_IMAGES_PER_PROMPT
- Number of images to generate per prompt
- -H, --height HEIGHT Height of the output image
- -W, --width WIDTH Width of the output image
- -g, --guidance_scale GUIDANCE_SCALE
- Guidance scale for diffusion
- -o, --output OUTPUT Output path for the generated image
- -s, --random_seed RANDOM_SEED
- Random seed for image generation
- --lora_dir LORA_DIR Path to directory containing LoRA files
- --wtype WTYPE Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
- --control_net_path CONTROL_NET_PATH
- Path to control net model
- --control_image_path CONTROL_IMAGE_PATH
- Path to image condition for Control Net
- --control_strength CONTROL_STRENGTH
- Strength to apply Control Net
-```
-
-##### Example
-
-```
-nexa run sd1-4
-```
-
-#### Run Vision-Language Model
-
-```
-nexa run MODEL_PATH
-usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
- -pf, --profiling Enable profiling logs for the inference process
- -st, --streamlit Run the inference in Streamlit UI
-
-VLM generation options:
- -t, --temperature TEMPERATURE
- Temperature for sampling
- -m, --max_new_tokens MAX_NEW_TOKENS
- Maximum number of new tokens to generate
- -k, --top_k TOP_K Top-k sampling parameter
- -p, --top_p TOP_P Top-p sampling parameter
- -sw, --stop_words [STOP_WORDS ...]
- List of stop words for early stopping
-```
-
-##### Example
-
-```
-nexa run nanollava
-```
-
-#### Run Audio Model
-
-```
-nexa run MODEL_PATH
-usage: nexa run [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-t TEMPERATURE] [-c COMPUTE_TYPE] [-st] model_path
-
-positional arguments:
- model_path Path or identifier for the model in Nexa Model Hub
-
-options:
- -h, --help show this help message and exit
- -st, --streamlit Run the inference in Streamlit UI
-
-Automatic Speech Recognition options:
- -b, --beam_size BEAM_SIZE
- Beam size to use for transcription
- -l, --language LANGUAGE
- The language spoken in the audio. It should be a language code such as 'en' or 'fr'.
- --task TASK Task to execute (transcribe or translate)
- -c, --compute_type COMPUTE_TYPE
- Type to use for computation (e.g., float16, int8, int8_float16)
-```
-
-##### Example
-
-```
-nexa run faster-whisper-tiny
-```
-
-### Start Local Server
-
-Start a local server using models on your local computer.
-
-```
-nexa server MODEL_PATH
-usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path
-
-positional arguments:
- model_path Path or identifier for the model in S3
-
-options:
- -h, --help show this help message and exit
- --host HOST Host to bind the server to
- --port PORT Port to bind the server to
- --reload Enable automatic reloading on code changes
-```
-
-#### Example
-
-```
-nexa server llama2
-```
-
-### Model Path Format
-
-For `model_path` in nexa commands, it's better to follow the standard format to ensure correct model loading and execution. The standard format for `model_path` is:
-
-- `[user_name]/[repo_name]:[tag_name]` (user's model)
-- `[repo_name]:[tag_name]` (official model)
-
-#### Examples:
-
-- `gemma-2b:q4_0`
-- `Meta-Llama-3-8B-Instruct:onnx-cpu-int8`
-- `alanzhuly/Qwen2-1B-Instruct:q4_0`
## Start Local Server
-You can start a local server using models on your local computer with the `nexa server` command. Here's the usage syntax:
-
-```
-usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path
-```
-
-### Options:
-
-- `--host`: Host to bind the server to
-- `--port`: Port to bind the server to
-- `--reload`: Enable automatic reloading on code changes
-
-### Example Commands:
-
-```
-nexa server gemma
-nexa server llama2-function-calling
-nexa server sd1-5
-nexa server faster-whipser-large
-```
-
-By default, `nexa server` will run gguf models. To run onnx models, simply add `onnx` after `nexa server`.
-
-## API Endpoints
-
-
-1. Text Generation: /v1/completions
-Generates text based on a single prompt.
-
-#### Request body:
-
-```json
-{
- "prompt": "Tell me a story",
- "temperature": 1,
- "max_new_tokens": 128,
- "top_k": 50,
- "top_p": 1,
- "stop_words": ["string"]
-}
-```
-
-#### Example Response:
-
-```json
-{
- "result": "Once upon a time, in a small village nestled among rolling hills..."
-}
-```
-
-
-
-2. Chat Completions: /v1/chat/completions
-
-Handles chat completions with support for conversation history.
-
-#### Request body:
-
-```json
-{
- "messages": [
- {
- "role": "user",
- "content": "Tell me a story"
- }
- ],
- "max_tokens": 128,
- "temperature": 0.1,
- "stream": false,
- "stop_words": []
-}
-```
+To start a local server using models on your local computer, you can use the `nexa server` command.
+For detailed information on server setup, API endpoints, and usage examples, please refer to the [Server Reference](SERVER.md) document.
-#### Example Response:
-
-```json
-{
- "id": "f83502df-7f5a-4825-a922-f5cece4081de",
- "object": "chat.completion",
- "created": 1723441724.914671,
- "choices": [
- {
- "message": {
- "role": "assistant",
- "content": "In the heart of a mystical forest..."
- }
- }
- ]
-}
-```
-
-
-3. Function Calling: /v1/function-calling
-
-Call the most appropriate function based on user's prompt.
-
-#### Request body:
-
-```json
-{
- "messages": [
- {
- "role": "user",
- "content": "Extract Jason is 25 years old"
- }
- ],
- "tools": [
- {
- "type": "function",
- "function": {
- "name": "UserDetail",
- "parameters": {
- "properties": {
- "name": {
- "description": "The user's name",
- "type": "string"
- },
- "age": {
- "description": "The user's age",
- "type": "integer"
- }
- },
- "required": ["name", "age"],
- "type": "object"
- }
- }
- }
- ],
- "tool_choice": "auto"
-}
-```
-
-#### Function format:
-
-```json
-{
- "type": "function",
- "function": {
- "name": "function_name",
- "description": "function_description",
- "parameters": {
- "type": "object",
- "properties": {
- "property_name": {
- "type": "string | number | boolean | object | array",
- "description": "string"
- }
- },
- "required": ["array_of_required_property_names"]
- }
- }
-}
-```
-
-#### Example Response:
-
-```json
-{
- "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
- "object": "chat.completion",
- "created": 1724186442,
- "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
- "choices": [
- {
- "finish_reason": "tool_calls",
- "index": 0,
- "logprobs": null,
- "message": {
- "role": "assistant",
- "content": null,
- "tool_calls": [
- {
- "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
- "type": "function",
- "function": {
- "name": "UserDetail",
- "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
- }
- }
- ],
- "function_call": {
- "name": "",
- "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
- }
- }
- }
- ],
- "usage": {
- "completion_tokens": 15,
- "prompt_tokens": 316,
- "total_tokens": 331
- }
-}
-```
-
-
-4. Text-to-Image: /v1/txt2img
-
-Generates images based on a single prompt.
-
-#### Request body:
-
-```json
-{
- "prompt": "A girl, standing in a field of flowers, vivid",
- "image_path": "",
- "cfg_scale": 7,
- "width": 256,
- "height": 256,
- "sample_steps": 20,
- "seed": 0,
- "negative_prompt": ""
-}
-```
-
-#### Example Response:
-
-```json
-{
- "created": 1724186615.5426757,
- "data": [
- {
- "base64": "base64_of_generated_image",
- "url": "path/to/generated_image"
- }
- ]
-}
-```
-
-
-5. Image-to-Image: /v1/img2img
-
-Modifies existing images based on a single prompt.
-
-#### Request body:
-
-```json
-{
- "prompt": "A girl, standing in a field of flowers, vivid",
- "image_path": "path/to/image",
- "cfg_scale": 7,
- "width": 256,
- "height": 256,
- "sample_steps": 20,
- "seed": 0,
- "negative_prompt": ""
-}
-```
-
-#### Example Response:
-
-```json
-{
- "created": 1724186615.5426757,
- "data": [
- {
- "base64": "base64_of_generated_image",
- "url": "path/to/generated_image"
- }
- ]
-}
-```
-
-
-6. Audio Transcriptions: /v1/audio/transcriptions
-
-Transcribes audio files to text.
-
-#### Parameters:
-
-- `beam_size` (integer): Beam size for transcription (default: 5)
-- `language` (string): Language code (e.g., 'en', 'fr')
-- `temperature` (number): Temperature for sampling (default: 0)
-
-#### Request body:
-
-```
-{
- "file" (form-data): The audio file to transcribe (required)
-}
-```
-
-#### Example Response:
-
-```json
-{
- "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
-}
-```
-
-
-7. Audio Translations: /v1/audio/translations
-
-Translates audio files to text in English.
-
-#### Parameters:
-
-- `beam_size` (integer): Beam size for transcription (default: 5)
-- `temperature` (number): Temperature for sampling (default: 0)
-
-#### Request body:
-
-```
-{
- "file" (form-data): The audio file to transcribe (required)
-}
-```
-
-#### Example Response:
-
-```json
-{
- "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
-}
-```
-
-
## Acknowledgements
We would like to thank the following projects:
diff --git a/SERVER.md b/SERVER.md
new file mode 100644
index 00000000..53114bfb
--- /dev/null
+++ b/SERVER.md
@@ -0,0 +1,317 @@
+## Start Local Server
+
+You can start a local server using models on your local computer with the `nexa server` command. Here's the usage syntax:
+
+```
+usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path
+```
+
+### Options:
+
+- `--host`: Host to bind the server to
+- `--port`: Port to bind the server to
+- `--reload`: Enable automatic reloading on code changes
+
+### Example Commands:
+
+```
+nexa server gemma
+nexa server llama2-function-calling
+nexa server sd1-5
+nexa server faster-whipser-large
+```
+
+By default, `nexa server` will run gguf models. To run onnx models, simply add `onnx` after `nexa server`.
+
+## API Endpoints
+
+
+### 1. Text Generation: /v1/completions
+Generates text based on a single prompt.
+
+#### Request body:
+
+```json
+{
+ "prompt": "Tell me a story",
+ "temperature": 1,
+ "max_new_tokens": 128,
+ "top_k": 50,
+ "top_p": 1,
+ "stop_words": ["string"]
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "result": "Once upon a time, in a small village nestled among rolling hills..."
+}
+```
+
+
+### 2. Chat Completions: /v1/chat/completions
+
+Handles chat completions with support for conversation history.
+
+#### Request body:
+
+```json
+{
+ "messages": [
+ {
+ "role": "user",
+ "content": "Tell me a story"
+ }
+ ],
+ "max_tokens": 128,
+ "temperature": 0.1,
+ "stream": false,
+ "stop_words": []
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "id": "f83502df-7f5a-4825-a922-f5cece4081de",
+ "object": "chat.completion",
+ "created": 1723441724.914671,
+ "choices": [
+ {
+ "message": {
+ "role": "assistant",
+ "content": "In the heart of a mystical forest..."
+ }
+ }
+ ]
+}
+```
+
+
+### 3. Function Calling: /v1/function-calling
+
+Call the most appropriate function based on user's prompt.
+
+#### Request body:
+
+```json
+{
+ "messages": [
+ {
+ "role": "user",
+ "content": "Extract Jason is 25 years old"
+ }
+ ],
+ "tools": [
+ {
+ "type": "function",
+ "function": {
+ "name": "UserDetail",
+ "parameters": {
+ "properties": {
+ "name": {
+ "description": "The user's name",
+ "type": "string"
+ },
+ "age": {
+ "description": "The user's age",
+ "type": "integer"
+ }
+ },
+ "required": ["name", "age"],
+ "type": "object"
+ }
+ }
+ }
+ ],
+ "tool_choice": "auto"
+}
+```
+
+#### Function format:
+
+```json
+{
+ "type": "function",
+ "function": {
+ "name": "function_name",
+ "description": "function_description",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "property_name": {
+ "type": "string | number | boolean | object | array",
+ "description": "string"
+ }
+ },
+ "required": ["array_of_required_property_names"]
+ }
+ }
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
+ "object": "chat.completion",
+ "created": 1724186442,
+ "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
+ "choices": [
+ {
+ "finish_reason": "tool_calls",
+ "index": 0,
+ "logprobs": null,
+ "message": {
+ "role": "assistant",
+ "content": null,
+ "tool_calls": [
+ {
+ "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
+ "type": "function",
+ "function": {
+ "name": "UserDetail",
+ "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
+ }
+ }
+ ],
+ "function_call": {
+ "name": "",
+ "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
+ }
+ }
+ }
+ ],
+ "usage": {
+ "completion_tokens": 15,
+ "prompt_tokens": 316,
+ "total_tokens": 331
+ }
+}
+```
+
+
+### 4. Text-to-Image: /v1/txt2img
+
+Generates images based on a single prompt.
+
+#### Request body:
+
+```json
+{
+ "prompt": "A girl, standing in a field of flowers, vivid",
+ "image_path": "",
+ "cfg_scale": 7,
+ "width": 256,
+ "height": 256,
+ "sample_steps": 20,
+ "seed": 0,
+ "negative_prompt": ""
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "created": 1724186615.5426757,
+ "data": [
+ {
+ "base64": "base64_of_generated_image",
+ "url": "path/to/generated_image"
+ }
+ ]
+}
+```
+
+
+### 5. Image-to-Image: /v1/img2img
+
+Modifies existing images based on a single prompt.
+
+#### Request body:
+
+```json
+{
+ "prompt": "A girl, standing in a field of flowers, vivid",
+ "image_path": "path/to/image",
+ "cfg_scale": 7,
+ "width": 256,
+ "height": 256,
+ "sample_steps": 20,
+ "seed": 0,
+ "negative_prompt": ""
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "created": 1724186615.5426757,
+ "data": [
+ {
+ "base64": "base64_of_generated_image",
+ "url": "path/to/generated_image"
+ }
+ ]
+}
+```
+
+
+### 6. Audio Transcriptions: /v1/audio/transcriptions
+
+Transcribes audio files to text.
+
+#### Parameters:
+
+- `beam_size` (integer): Beam size for transcription (default: 5)
+- `language` (string): Language code (e.g., 'en', 'fr')
+- `temperature` (number): Temperature for sampling (default: 0)
+
+#### Request body:
+
+```
+{
+ "file" (form-data): The audio file to transcribe (required)
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
+}
+```
+
+
+### 7. Audio Translations: /v1/audio/translations
+
+Translates audio files to text in English.
+
+#### Parameters:
+
+- `beam_size` (integer): Beam size for transcription (default: 5)
+- `temperature` (number): Temperature for sampling (default: 0)
+
+#### Request body:
+
+```
+{
+ "file" (form-data): The audio file to transcribe (required)
+}
+```
+
+#### Example Response:
+
+```json
+{
+ "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
+}
+```
+