Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Doc]Add documentation for using EAGLE in vLLM #11417

Draft
wants to merge 76 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
5650b95
Merge pull request #1 from vllm-project/main
sroy745 May 29, 2024
8f36146
Merge branch 'vllm-project:main' into main
sroy745 Jun 3, 2024
9e75057
Merge branch 'vllm-project:main' into main
sroy745 Jun 3, 2024
db2c679
Merge branch 'vllm-project:main' into main
sroy745 Jun 7, 2024
8d7512c
Merge branch 'vllm-project:main' into main
sroy745 Jun 10, 2024
1473f74
Merge branch 'vllm-project:main' into main
sroy745 Jun 12, 2024
4013e1a
Merge branch 'vllm-project:main' into main
sroy745 Jun 14, 2024
2dbdd78
Merge branch 'vllm-project:main' into main
sroy745 Jun 17, 2024
b3575e9
Merge branch 'vllm-project:main' into main
sroy745 Jun 20, 2024
94b0d43
Merge branch 'vllm-project:main' into main
sroy745 Jun 24, 2024
fa8fedf
Merge branch 'vllm-project:main' into main
sroy745 Jun 27, 2024
6ed96b4
Merge branch 'vllm-project:main' into main
sroy745 Jun 27, 2024
b71c533
Merge branch 'vllm-project:main' into main
sroy745 Jun 28, 2024
57babef
Merge branch 'vllm-project:main' into main
sroy745 Jun 29, 2024
4b19bac
Merge branch 'vllm-project:main' into main
sroy745 Jul 1, 2024
eb7a1c4
Merge branch 'vllm-project:main' into main
sroy745 Jul 6, 2024
7e2c87e
Merge branch 'vllm-project:main' into main
sroy745 Jul 10, 2024
6212d5f
Merge branch 'vllm-project:main' into main
sroy745 Jul 15, 2024
5491438
Merge branch 'vllm-project:main' into main
sroy745 Jul 17, 2024
68e080a
Merge branch 'vllm-project:main' into main
sroy745 Jul 31, 2024
55e4332
Merge branch 'vllm-project:main' into main
sroy745 Aug 13, 2024
532eb48
Merge branch 'vllm-project:main' into main
sroy745 Aug 22, 2024
7cea056
Merge branch 'vllm-project:main' into main
sroy745 Aug 22, 2024
185e056
Merge branch 'vllm-project:main' into main
sroy745 Aug 24, 2024
e2be95f
Merge branch 'vllm-project:main' into main
sroy745 Aug 27, 2024
2ed5473
Merge branch 'vllm-project:main' into main
sroy745 Aug 28, 2024
efa4714
Merge branch 'vllm-project:main' into main
sroy745 Aug 29, 2024
fb87d34
Merge branch 'vllm-project:main' into main
sroy745 Aug 29, 2024
5419e49
Merge branch 'vllm-project:main' into main
sroy745 Aug 31, 2024
9ba12f8
Merge branch 'vllm-project:main' into main
sroy745 Sep 2, 2024
25cef3d
Merge branch 'vllm-project:main' into main
sroy745 Sep 3, 2024
9d4cd09
Merge branch 'vllm-project:main' into main
sroy745 Sep 4, 2024
c48cacb
Merge branch 'vllm-project:main' into main
sroy745 Sep 5, 2024
c42c399
Merge branch 'vllm-project:main' into main
sroy745 Sep 7, 2024
3d13e43
Merge branch 'vllm-project:main' into main
sroy745 Sep 9, 2024
7479775
Merge branch 'vllm-project:main' into main
sroy745 Sep 11, 2024
df9b966
Merge branch 'vllm-project:main' into main
sroy745 Sep 17, 2024
9a7ed92
Merge branch 'vllm-project:main' into main
sroy745 Sep 17, 2024
118e838
Merge branch 'vllm-project:main' into main
sroy745 Sep 19, 2024
e640c69
Merge branch 'vllm-project:main' into main
sroy745 Sep 20, 2024
89fb6cd
Merge branch 'vllm-project:main' into main
sroy745 Sep 23, 2024
5d886cc
Merge branch 'vllm-project:main' into main
sroy745 Sep 24, 2024
56f2065
Merge branch 'vllm-project:main' into main
sroy745 Sep 24, 2024
28e103e
Merge branch 'vllm-project:main' into main
sroy745 Sep 25, 2024
2fc1490
Merge branch 'vllm-project:main' into main
sroy745 Sep 25, 2024
8805750
Merge branch 'vllm-project:main' into main
sroy745 Sep 26, 2024
b30e5af
Merge branch 'vllm-project:main' into main
sroy745 Sep 28, 2024
92322f1
Merge branch 'vllm-project:main' into main
sroy745 Sep 30, 2024
85e9001
Merge branch 'vllm-project:main' into main
sroy745 Oct 1, 2024
cd4ff89
Merge branch 'vllm-project:main' into main
sroy745 Oct 1, 2024
0dd96ed
Merge branch 'vllm-project:main' into main
sroy745 Oct 1, 2024
9d4d969
Merge branch 'vllm-project:main' into main
sroy745 Oct 3, 2024
7d223b5
Merge branch 'vllm-project:main' into main
sroy745 Oct 5, 2024
f327d91
Merge branch 'vllm-project:main' into main
sroy745 Oct 5, 2024
b5adf28
Merge branch 'vllm-project:main' into main
sroy745 Oct 6, 2024
caf0d12
Merge branch 'vllm-project:main' into main
sroy745 Oct 7, 2024
28e77b1
Merge branch 'vllm-project:main' into main
sroy745 Oct 8, 2024
db7e46d
Merge branch 'vllm-project:main' into main
sroy745 Oct 9, 2024
59b35f0
Merge branch 'vllm-project:main' into main
sroy745 Oct 17, 2024
dd9affa
Merge branch 'vllm-project:main' into main
sroy745 Oct 17, 2024
f61a15d
Merge branch 'vllm-project:main' into main
sroy745 Oct 21, 2024
0569773
Merge branch 'vllm-project:main' into main
sroy745 Oct 27, 2024
a2090e0
Merge branch 'vllm-project:main' into main
sroy745 Oct 30, 2024
c9a3f00
Merge branch 'vllm-project:main' into main
sroy745 Nov 1, 2024
b59e6a8
Merge branch 'vllm-project:main' into main
sroy745 Nov 3, 2024
fd9fdff
Merge branch 'vllm-project:main' into main
sroy745 Nov 8, 2024
366cbf7
Merge branch 'vllm-project:main' into main
sroy745 Nov 11, 2024
65c9c79
Merge branch 'vllm-project:main' into main
sroy745 Nov 22, 2024
840c89d
Merge branch 'vllm-project:main' into main
sroy745 Nov 26, 2024
8700ecb
Merge branch 'vllm-project:main' into main
sroy745 Nov 26, 2024
b73e3be
Merge branch 'vllm-project:main' into main
sroy745 Dec 1, 2024
b3c4472
Merge branch 'vllm-project:main' into main
sroy745 Dec 5, 2024
111ff87
Merge branch 'vllm-project:main' into main
sroy745 Dec 9, 2024
33d737a
Merge branch 'vllm-project:main' into main
sroy745 Dec 10, 2024
aa24b0c
Merge branch 'vllm-project:main' into main
sroy745 Dec 22, 2024
241f0f0
Documentation for using EAGLE in vLLM
sroy745 Dec 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/source/usage/spec_decode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,58 @@ A variety of speculative models of this type are available on HF hub:
* `granite-7b-instruct-accelerator <https://huggingface.co/ibm-granite/granite-7b-instruct-accelerator>`_
* `granite-20b-code-instruct-accelerator <https://huggingface.co/ibm-granite/granite-20b-code-instruct-accelerator>`_

Speculating using Eagle based draft models
-------------------------------------------

The following code configures vLLM to use speculative decoding where proposals are generated by
a `EAGLE(Extrapolation Algorithm for Greater Language-model Efficiency)<https://arxiv.org/pdf/2401.15077>` based draft model.

.. code-block:: python

from vllm import LLM, SamplingParams

prompts = [
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
model="meta-llama/Meta-Llama-3.1-70B-Instruct",
tensor_parallel_size=4,
speculative_model="ibm-fms/llama3-70b-accelerator",
speculative_draft_tensor_parallel_size=1,
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

A few important things to consider when using the EAGLE based draft models.

1. The EAGLE based draft models currently need to be run without tensor parallelism, although
it is possible to run the main model using tensor parallelism (see example above). Since the
speculative models are relatively small, we still see significant speedups. However, this
limitation will be fixed in a future release.

2. The EAGLE draft models available in this Hugging Face repository cannot be used directly
with vLLM due to differences in the expected layer names and model definition. To use these
models with vLLM, use the provided script to convert them. Note that this script does not
modify the model's weights.


3. When using EAGLE-based speculators with vLLM, the observed speedup is lower than what is
expected when using EAGLE-based draft models for speculative decoding.
This issue is under investigation and tracked here: `https://github.com/vllm-project/vllm/issues/9565`.
Known differences between the vLLM implementation of EAGLE-based speculation and the original EAGLE implementation include:

a. ......
b. .....

A variety of EAGLE draft models are available on HF hub:


Lossless guarantees of Speculative Decoding
-------------------------------------------
In vLLM, speculative decoding aims to enhance inference efficiency while maintaining accuracy. This section addresses the lossless guarantees of
Expand Down
Loading