Skip to content

Commit

Permalink
doc: fix c++ serving example (flexflow#1372)
Browse files Browse the repository at this point in the history
Co-authored-by: Gabriele Oliaro <[email protected]>
  • Loading branch information
chenzhuofu and goliaro authored Apr 30, 2024
1 parent 002fdf0 commit d54e4b6
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ A C++ example is available at [this folder](../inference/spec_infer/). After bui
For example, you can use the following command line to serve a LLaMA-7B or LLaMA-13B model on 4 GPUs and use two collectively boost-tuned LLaMA-68M models for speculative inference.

```bash
./inference/spec_infer/spec_infer -ll:gpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion
./inference/spec_infer/spec_infer -ll:gpu 4 -ll:cpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion
```
</details>

Expand Down
2 changes: 1 addition & 1 deletion SERVE.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ A C++ example is available at [this folder](../inference/spec_infer/). After bui
For example, you can use the following command line to serve a LLaMA-7B or LLaMA-13B model on 4 GPUs and use two collectively boost-tuned LLaMA-68M models for speculative inference.

```bash
./inference/spec_infer/spec_infer -ll:gpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion
./inference/spec_infer/spec_infer -ll:gpu 4 -ll:cpu 4 -ll:fsize 14000 -ll:zsize 30000 -llm-model meta-llama/Llama-2-7b-hf -ssm-model JackFram/llama-68m -prompt /path/to/prompt.json -tensor-parallelism-degree 4 --fusion
```
</details>

Expand Down

0 comments on commit d54e4b6

Please sign in to comment.