diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/develop/rust/wasinn/llm_inference.md b/i18n/zh/docusaurus-plugin-content-docs/current/develop/rust/wasinn/llm_inference.md index 63f9fc0e..1a10f9eb 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/develop/rust/wasinn/llm_inference.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/develop/rust/wasinn/llm_inference.md @@ -29,19 +29,33 @@ git clone curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-7b-c Run the inference application in WasmEdge. ```bash -wasmedge --dir .:. \ - --nn-preload default:GGML:CPU:llama-2-7b.Q5_K_M.gguf llama-chat.wasm default \ - --prompt 'Robert Oppenheimer most important achievement is ' \ - --ctx-size 4096 +wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf \ + llama-chat.wasm --prompt-template llama-2-chat ``` -After executing the command, you may need to wait a moment for the input prompt to appear. Once the execution is complete, the following output will be generated. +After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt: ```bash -Robert Oppenheimer most important achievement is -1945 Manhattan Project. -Robert Oppenheimer was born in New York City on April 22, 1904. He was the son of Julius Oppenheimer, a wealthy German-Jewish textile merchant, and Ella Friedman Oppenheimer. -Robert Oppenheimer was a brilliant student. He attended the Ethical Culture School in New York City and graduated from the Ethical Culture Fieldston School in 1921. He then attended Harvard University, where he received his bachelor's degree. +[USER]: +I have two apples, each costing 5 dollars. What is the total cost of these apple +*** [prompt begin] *** +[INST] <> +You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <> + +I have two apples, each costing 5 dollars. What is the total cost of these apple [/INST] +*** [prompt end] *** +[ASSISTANT]: +The total cost of the two apples is 10 dollars. +[USER]: +How about four apples? +*** [prompt begin] *** +[INST] <> +You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <> + +I have two apples, each costing 5 dollars. What is the total cost of these apple [/INST] The total cost of the two apples is 10 dollars. [INST] How about four apples? [/INST] +*** [prompt end] *** +[ASSISTANT]: +The total cost of four apples is 20 dollars. ``` ## Build and run @@ -70,57 +84,78 @@ curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-13b-q5_k_m.ggu Next, use WasmEdge to load the llama-2-13b model and then ask the model to questions. ```bash -wasmedge --dir .:. \ - --nn-preload default:GGML:CPU:llama-2-13b.Q5_K_M.gguf llama-chat.wasm default \ - --prompt 'Robert Oppenheimer most important achievement is ' \ - --ctx-size 4096 +wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf \ + llama-chat.wasm --prompt-template llama-2-chat ``` After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt: ```bash -Robert Oppenheimer most important achievement is -1945 Manhattan Project. -Robert Oppenheimer was born in New York City on April 22, 1904. He was the son of Julius Oppenheimer, a wealthy German-Jewish textile merchant, and Ella Friedman Oppenheimer. -Robert Oppenheimer was a brilliant student. He attended the Ethical Culture School in New York City and graduated from the Ethical Culture Fieldston School in 1921. He then attended Harvard University, where he received his bachelor's degree +[USER]: +Who is Robert Oppenheimer? +*** [prompt begin] *** +[INST] <> +You are a helpful, respectful and honest assistant. Always answer as short as possible, while being safe. <> + +Who is Robert Oppenheimer? [/INST] +*** [prompt end] *** +[ASSISTANT]: +Robert Oppenheimer was an American theoretical physicist and director of the Manhattan Project, which developed the atomic bomb during World War II. He is widely regarded as one of the most important physicists of the 20th century and is known for his contributions to the development of quantum mechanics and the theory of the atomic nucleus. Oppenheimer was also a prominent figure in the post-war nuclear weapons debate and was a strong advocate for international cooperation on nuclear weapons control. ``` -## Optional: Configure the model - -You can use environment variables to configure the model execution. +## Optional: run the model with different CLI -| Option |Default |Function | -| -------|-----------|----- | -| LLAMA_LOG | 0 |The backend will print diagnostic information when this value is set to 1| -|LLAMA_N_CTX |512| The context length is the max number of tokens in the entire conversation| -|LLAMA_N_PREDICT |512|The number of tokens to generate in each response from the model| - -For example, the following command specifies a context length of 4k tokens, which is standard for llama2, and the max number of tokens in each response to be 1k. It also tells WasmEdge to print out logs and statistics of the model at runtime. +We also have CLI options for more information. +```bash + -m, --model-alias + Model alias [default: default] + -c, --ctx-size + Size of the prompt context [default: 4096] + -n, --n-predict + Number of tokens to predict [default: 1024] + -g, --n-gpu-layers + Number of layers to run on the GPU [default: 100] + -b, --batch-size + Batch size for prompt processing [default: 4096] + -r, --reverse-prompt + Halt generation at PROMPT, return control. + -s, --system-prompt + System prompt message string [default: "[Default system message for the prompt template]"] + -p, --prompt-template