fix rebase

Signed-off-by: ezirmusitua <[email protected]>
WasmEdge · Nov 20, 2023 · 80f89dd · 80f89dd
1 parent 39c7744
commit 80f89dd
Show file tree

Hide file tree

Showing 4 changed files with 64 additions and 53 deletions.
diff --git a/docs/contribute/plugin/intro.md b/docs/contribute/plugin/intro.md
@@ -77,14 +77,14 @@ There are several plug-in releases with the WasmEdge official releases. Please c
 
 | Plug-in | Rust Crate | Released Platforms | Build Steps |
 | --- | --- | --- | --- |
-| WasmEdge-Process | [wasmedge_process_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, and `ubuntu 20.04 x86_64` (since `0.10.0`) | [Build Wtih WasmEdge-Process](../source/plugin/process.md) |
+| [WasmEdge-Process](../source/plugin/process.md) | [wasmedge_process_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, and `ubuntu 20.04 x86_64` (since `0.10.0`) | [Build Wtih WasmEdge-Process](../source/plugin/process.md) |
 | [WASI-Crypto][] | [wasi-crypto][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, and `ubuntu 20.04 x86_64` (since `0.10.1`) | [Build With WASI-Crypto](../source/plugin/wasi_crypto.md) |
 | [WASI-NN with OpenVINO backend](../../develop/rust/wasinn/openvino.md) | [wasi-nn][] | `ubuntu 20.04 x86_64` (since `0.10.1`) | [Build With WASI-NN](../source/plugin/wasi_nn.md#build-wasmedge-with-wasi-nn-openvino-backend) |
 | [WASI-NN with PyTorch backend](../../develop/rust/wasinn/pytorch.md) | [wasi-nn][] | `ubuntu 20.04 x86_64` (since `0.11.1`) | [Build With WASI-NN](../source/plugin/wasi_nn#build-wasmedge-with-wasi-nn-pytorch-backend) |
 | [WASI-NN with TensorFlow-Lite backend](../../develop/rust/wasinn/tensorflow_lite.md) | [wasi-nn][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, and `ubuntu 20.04 x86_64` (since `0.11.2`) | [Build With WASI-NN](../source/plugin/wasi_nn#build-wasmedge-with-wasi-nn-tensorflow-lite-backend) |
-| WasmEdge-Image | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-Image](../source/plugin/image.md) |
-| WasmEdge-Tensorflow | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-Tensorflow](../source/plugin/tensorflow.md) |
-| WasmEdge-TensorflowLite | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-TensorflowLite](../source/plugin/tensorflowlite.md) |
+| [WasmEdge-Image](../source/plugin/image.md) | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-Image](../source/plugin/image.md) |
+| [WasmEdge-Tensorflow](../source/plugin/tensorflow.md) | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-Tensorflow](../source/plugin/tensorflow.md) |
+| [WasmEdge-TensorflowLite](../source/plugin/tensorflowlite.md) | [wasmedge_tensorflow_interface][] | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Build With WasmEdge-TensorflowLite](../source/plugin/tensorflowlite.md) |
 
 <!-- prettier-ignore -->
 :::note
@@ -94,4 +94,4 @@ Due to the `OpenVINO` dependency, we only release the WASI-NN plug-in for the `O
 [wasmedge_process_interface]: https://crates.io/crates/wasmedge_process_interface
 [wasmedge_tensorflow_interface]: https://crates.io/crates/wasmedge_tensorflow_interface
 [wasi-crypto]: https://crates.io/crates/wasi-crypto
-[wasi-nn]: https://crates.io/crates/wasi-nn
+[wasi-nn]: https://crates.io/crates/wasi-nn
diff --git a/docs/develop/rust/wasinn/llm_inference.md b/docs/develop/rust/wasinn/llm_inference.md
@@ -13,47 +13,34 @@ WasmEdge now supports Llama2, Codellama-instruct, BELLE-Llama, Mistral-7b-instru
 Besides the [regular WasmEdge and Rust requirements](../../rust/setup.md), please make sure that you have the [Wasi-NN plugin with ggml installed](../../../start/install.md#wasi-nn-plug-in-with-ggml-backend).
 
 ## Quick start
+
 Because the example already includes a compiled WASM file from the Rust code, we could use WasmEdge CLI to execute the example directly. First, git clone the `llama-utils` repo.
 
 ```bash
-git clone https://github.com/second-state/llama-utils.git
-cd chat
+curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
 ```
 
 Next, let's get the model. In this example, we are going to use the llama2 7b chat model in GGUF format. You can also use other kinds of llama2 models, check out [here](https://github.com/second-state/llama-utils/blob/main/chat/README.md#get-model).
 
 ```bash
-git clone curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-7b-chat-q5_k_m.gguf
+curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-7b-chat-q5_k_m.gguf
 ```
 
 Run the inference application in WasmEdge.
 
 ```bash
-wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf \
-  llama-chat.wasm --prompt-template llama-2-chat
+wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
 ```
 
 After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
 
 ```bash
 [USER]:
 I have two apples, each costing 5 dollars. What is the total cost of these apple
-*** [prompt begin] ***
-<s>[INST] <<SYS>>
-You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <</SYS>>
-
-I have two apples, each costing 5 dollars. What is the total cost of these apple [/INST]
-*** [prompt end] ***
 [ASSISTANT]:
 The total cost of the two apples is 10 dollars.
 [USER]:
 How about four apples?
-*** [prompt begin] ***
-<s>[INST] <<SYS>>
-You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <</SYS>>
-
-I have two apples, each costing 5 dollars. What is the total cost of these apple [/INST] The total cost of the two apples is 10 dollars. </s><s>[INST] How about four apples? [/INST]
-*** [prompt end] ***
 [ASSISTANT]:
 The total cost of four apples is 20 dollars.
 ```
@@ -73,30 +60,32 @@ Second, use `cargo` to build the example project.
 cargo build --target wasm32-wasi --release
 ```
 
-The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`. Next, use WasmEdge to load the llama-2-7b model and then ask the model to questions.
+The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`. 
+
+We also need to get the model. Here we use the llama-2-13b model.
 
 ```bash
-wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
+curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-13b-q5_k_m.gguf
+```
+
+Next, use WasmEdge to load the llama-2-13b model and then ask the model to questions.
+
+```bash
+wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
 ```
 
 After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
 
 ```bash
 [USER]:
 Who is Robert Oppenheimer?
-*** [prompt begin] ***
-<s>[INST] <<SYS>>
-You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <</SYS>>
-
-Who is Robert Oppenheimer? [/INST]
-*** [prompt end] ***
 [ASSISTANT]:
 Robert Oppenheimer was an American theoretical physicist and director of the Manhattan Project, which developed the atomic bomb during World War II. He is widely regarded as one of the most important physicists of the 20th century and is known for his contributions to the development of quantum mechanics and the theory of the atomic nucleus. Oppenheimer was also a prominent figure in the post-war nuclear weapons debate and was a strong advocate for international cooperation on nuclear weapons control.
 ```
 
-## Optional: run the model with different CLI
+## Options
 
-We also have CLI options for more information.
+You can configure the chat inference application through CLI options.
 
 ```bash
   -m, --model-alias <ALIAS>
@@ -119,19 +108,33 @@ We also have CLI options for more information.
           Print prompt strings to stdout
       --log-stat
           Print statistics to stdout
-      --log-enable
+      --log-all
           Print all log information to stdout
       --stream-stdout
           Print the output to stdout in the streaming way
   -h, --help
           Print help
 ```
 
-For example, the following command tells WasmEdge to print out logs and statistics of the model at runtime.
+The `--prompt-template` option is perhaps the most interesting. It allows the application to support different open source LLM models beyond llama2. 
+
+| Template name | Model | Download |
+| ------------ | ------------------------------ | --- |
+| llama-2-chat | [The standard llama2 chat model](https://ai.meta.com/llama/) | [7b](https://huggingface.co/wasmedge/llama2/resolve/main/llama-2-7b-chat-q5_k_m.gguf) | 
+| codellama-instruct | [CodeLlama](https://about.fb.com/news/2023/08/code-llama-ai-for-coding/) | [7b](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q5_K_M.gguf) |
+| mistral-instruct-v0.1 | [Mistral](https://mistral.ai/) | [7b](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf) |
+| mistrallite | [Mistral Lite](https://huggingface.co/amazon/MistralLite) | [7b](https://huggingface.co/TheBloke/MistralLite-7B-GGUF/resolve/main/mistrallite.Q5_K_M.gguf) |
+| openchat | [OpenChat](https://github.com/imoneoi/openchat) | [7b](https://huggingface.co/TheBloke/openchat_3.5-GGUF/resolve/main/openchat_3.5.Q5_K_M.gguf) |
+| belle-llama-2-chat | [BELLE](https://github.com/LianjiaTech/BELLE) | [13b](https://huggingface.co/second-state/BELLE-Llama2-13B-Chat-0.4M-GGUF/resolve/main/BELLE-Llama2-13B-Chat-0.4M-ggml-model-q4_0.gguf) |
+| vicuna-chat | [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) | [7b](https://huggingface.co/TheBloke/vicuna-7B-v1.5-GGUF/resolve/main/vicuna-7b-v1.5.Q5_K_M.gguf) |
+| chatml | [ChatML](https://huggingface.co/chargoddard/rpguild-chatml-13b) | [13b](https://huggingface.co/TheBloke/rpguild-chatml-13B-GGUF/resolve/main/rpguild-chatml-13b.Q5_K_M.gguf) |
+
+
+Furthermore, the following command tells WasmEdge to print out logs and statistics of the model at runtime.
 
 ```
 wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf \
-  llama-chat.wasm --prompt-template llama-2-chat --log-enable
+  llama-chat.wasm --prompt-template llama-2-chat --log-stat
 ..................................................................................................
 llama_new_context_with_model: n_ctx      = 512
 llama_new_context_with_model: freq_base  = 10000.0
@@ -149,25 +152,35 @@ llama_print_timings:       total time =   25104.57 ms
 Ah, a fellow Peanuts enthusiast! Snoopy is Charlie Brown's lovable and imaginative beagle, known for his wild and wacky adventures in the comic strip and television specials. He's a loyal companion to Charlie Brown and the rest of the Peanuts gang, and his antics often provide comic relief in the series. Is there anything else you'd like to know about Snoopy? 🐶
 ```
 
-## Improve performance
+## Improving performance
 
 You can make the inference program run faster by AOT compiling the wasm file first.
 
 ```bash
 wasmedge compile llama-chat.wasm llama-chat.wasm
-wasmedge --dir .:.  --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
+wasmedge --dir .:.  --nn-preload default:GGML:AUTO:llama-2-13b-q5_k_m.gguf llama-chat.wasm
 ```
 
 ## Understand the code
 
-The [main.rs](https://github.com/second-state/llama-utils/blob/main/chat/src/main.rs
-) is the full Rust code to create an interactive chatbot using a LLM. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2 and other model’s chat templates, and runs the inference operations using the WASI NN standard API.
+The [main.rs](https://github.com/second-state/llama-utils/blob/main/chat/src/main.rs) is the full Rust code to create an interactive chatbot using a LLM. The Rust program manages the user input, tracks the conversation history, transforms the text into the llama2 and other model’s chat templates, and runs the inference operations using the WASI NN standard API. The code logic for the chat interaction is somewhat complex. In this section, we will use the [simple example](https://github.com/second-state/llama-utils/tree/main/simple) to explain how to set up and perform one inference round trip. Here is how you use the simple example.
+
+```bash
+# Download the compiled simple inference wasm
+curl -LO https://github.com/second-state/llama-utils/raw/main/simple/llama-simple.wasm
+
+# Give it a prompt and ask it to use the model to complete it.
+wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-simple.wasm \
+  --prompt 'Robert Oppenheimer most important achievement is ' --ctx-size 4096
+
+output: in 1942, when he led the team that developed the first atomic bomb, which was dropped on Hiroshima, Japan in 1945.
+```
 
 First, let's parse command line arguments to customize the chatbot's behavior using `Command` struct. It extracts the following parameters: `prompt` (a prompt that guides the conversation), `model_alias` (a list for the loaded model), and `ctx_size` (the size of the chat context). 
 
 ```rust
 fn main() -> Result<(), String> {
-    let matches = Command::new("Llama API Server")
+    let matches = Command::new("Simple LLM inference")
         .arg(
             Arg::new("prompt")
                 .short('p')
@@ -247,7 +260,7 @@ Next, execute the model inference.
     context.compute().expect("Failed to complete inference");
 ```
 
-After the inference is finished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
+After the inference is fiished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
 
 ```rust
   let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];
@@ -265,8 +278,8 @@ println!("\nprompt: {}", &prompt);
 println!("\noutput: {}", output);
 ```
 
-The code explanation above is simple [one time chat with llama 2 model](https://github.com/second-state/llama-utils/tree/main/simple). But we have more!
+## Resources
 
-* If you're looking for continuous conversations with llama 2 models, please check out the source code [here](https://github.com/second-state/llama-utils/tree/main/chat).
-* If you want to construct OpenAI-compatible APIs specifically for your llama2 model, or the Llama2 model itself, please check out the surce code [here](https://github.com/second-state/llama-utils/tree/main/api-server).
-* For the reason why we need to run LLama2 model with WasmEdge, please check out [this article](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359).
+* If you're looking for multi-turn conversations with llama 2 models, please check out the above mentioned chat example source code [here](https://github.com/second-state/llama-utils/tree/main/chat).
+* If you want to construct OpenAI-compatible APIs specifically for your llama2 model, or the Llama2 model itself, please check out the source code [for the API server](https://github.com/second-state/llama-utils/tree/main/api-server).
+* To learn more, please check out [this article](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359).
diff --git a/docs/start/wasmedge/extensions/plugins.md b/docs/start/wasmedge/extensions/plugins.md
@@ -26,8 +26,6 @@ The following lists are the WasmEdge official released plug-ins. Users can insta
 | WasmEdge-eBPF                                                                                                                    | A native library for inferring eBPF applications                                                                                                                                                          | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | Rust                                                                   |
 | WasmEdge-rusttls                                                                                                                 | A native library for inferring Rust and TLS Library                                                                                                                                                       | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `ubuntu 20.04 x86_64`, `darwin x86_64`, and `darwin arm64` (since `0.13.0`) | [Rust](https://crates.io/crates/wasmedge_rustls_api)    
 
-
-
 ## Old WasmEdge Extensions
 
 Besides the plug-ins, WasmEdge provides the extensions before the `0.13.0` versions. Noticed that the extensions are replaced by the corresponding plug-ins after the `0.13.0` version.
@@ -37,4 +35,4 @@ The latest version supporting the extensions is `0.12.1`. This chapter will be d
 | Extension | Description | Platform Support | Language support |
 | --- | --- | --- | --- |
 | [Image processing](https://github.com/second-state/WasmEdge-image) | A native library to manipulate images for AI inference tasks. Migrated into the plug-in after WasmEdge `0.13.0`. | `manylinux2014 x86_64`, `manylinux2014 aarch64`, `android aarch64`, `ubuntu 20.04 x86_64`, and `darwin x86_64` | [Rust](https://crates.io/crates/wasmedge_tensorflow_interface) (0.2.2) |
-| [TensorFlow and Tensorflow-Lite](https://github.com/second-state/WasmEdge-tensorflow) | A native library to inferring TensorFlow and TensorFlow-Lite models. Migrated into the plug-in after WasmEdge `0.13.0`. | `manylinux2014 x86_64`, `manylinux2014 aarch64` (TensorFlow-Lite only), `android aarch64` (TensorFlow-Lite only), `ubuntu 20.04 x86_64`, and `darwin x86_64` | [Rust](https://crates.io/crates/wasmedge_tensorflow_interface) (0.2.2) |
+| [TensorFlow and Tensorflow-Lite](https://github.com/second-state/WasmEdge-tensorflow) | A native library to inferring TensorFlow and TensorFlow-Lite models. Migrated into the plug-in after WasmEdge `0.13.0`. | `manylinux2014 x86_64`, `manylinux2014 aarch64` (TensorFlow-Lite only), `android aarch64` (TensorFlow-Lite only), `ubuntu 20.04 x86_64`, and `darwin x86_64` | [Rust](https://crates.io/crates/wasmedge_tensorflow_interface) (0.2.2) |