Skip to content

Commit

Permalink
Update llm_inference.md
Browse files Browse the repository at this point in the history
Signed-off-by: Michael Yuan <[email protected]>
  • Loading branch information
juntao committed Jul 29, 2024
1 parent 2cf37a8 commit f373a49
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions docs/develop/rust/wasinn/llm_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,9 @@ You can configure the chat inference application through CLI options.

The `--prompt-template` option is perhaps the most interesting. It allows the application to support different open source LLM models beyond llama2. Check out more prompt templates [here](https://github.com/LlamaEdge/LlamaEdge/tree/main/api-server/chat-prompts).

The `--ctx-size` option specifies the context windows size of the application. It is limited by the model's intrinsic context window size. If you increase the `--ctx-size`, make sure that you also
explicitly specify the `--batch-size` to a reasonable value (e.g., `--batch-size 512`).
The `--ctx-size` option specifies the context windows size of the application. It is limited by the model's intrinsic context window size.

The following command tells WasmEdge to print out logs and statistics of the model at runtime.
The `--log-stat` tells WasmEdge to print out logs and statistics of the model at runtime.

```bash
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf \
Expand Down

0 comments on commit f373a49

Please sign in to comment.