Skip to content

Commit

Permalink
Update model download link
Browse files Browse the repository at this point in the history
Signed-off-by: alabulei1 <[email protected]>
  • Loading branch information
alabulei1 authored Oct 30, 2023
1 parent e549761 commit 6d118bc
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions docs/develop/rust/wasinn/llm-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ cd chat
Next, let's get the model. In this example, we are going to use the llama2 7b chat model in GGUF format. You can also use other kinds of llama2 models, check out [here](https://github.com/second-state/llama-utils/blob/main/chat/README.md#get-model).

```
git clone curl -LO https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
git clone curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-7b-chat-q5_k_m.gguf
```

Run the inference application in WasmEdge.

```
wasmedge --dir .:. \
--nn-preload default:GGML:CPU:llama-2-7b-chat.Q5_K_M.gguf \
--nn-preload default:GGML:CPU:llama-2-7b-chat-q5_k_m.gguf \
llama-chat.wasm --model-alias default --prompt-template llama-2-chat
```
After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
Expand Down Expand Up @@ -72,13 +72,13 @@ The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`.
We also need to get the model. Here we use the llama-2-13b model.

```
curl -LO https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf
curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-13b-q5_k_m.gguf
```
Next, use WasmEdge to load the Codellama-instruct model and then ask the model to write code by chatting.

```
wasmedge --dir .:. \
--nn-preload default:GGML:CPU:llama-2-13b-chat.Q5_K_M.gguf \
--nn-preload default:GGML:CPU:llama-2-13b-q5_k_m.gguf \
llama-chat.wasm --model-alias default --prompt-template llama-2-chat
```
After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
Expand Down Expand Up @@ -130,7 +130,7 @@ You can make the inference program run faster by AOT compiling the wasm file fir
```
wasmedge compile llama-chat.wasm llama-chat.wasm
wasmedge --dir .:. \
--nn-preload default:GGML:CPU:llama-2-13b-chat.Q5_K_M.gguf \
--nn-preload default:GGML:CPU:llama-2-13b-q5_k_m.gguf \
llama-chat.wasm --model-alias default --prompt-template llama-2-chat
```

Expand Down

0 comments on commit 6d118bc

Please sign in to comment.