Skip to content

Commit

Permalink
chore: add notes for performance
Browse files Browse the repository at this point in the history
Signed-off-by: Aaron Pham <[email protected]>
  • Loading branch information
aarnphm committed Dec 3, 2024
1 parent 59221e6 commit 4ee464a
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions vllm/model_executor/guided_decoding/xgrammar_decoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ def __call__(self, input_ids: list[int],

for i, matcher in enumerate(self.matchers):
if not matcher.is_terminated():
# @ubospica: ideally, fill_next_token_bitmask should be parallelized with model decoding

Check failure on line 236 in vllm/model_executor/guided_decoding/xgrammar_decoding.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/model_executor/guided_decoding/xgrammar_decoding.py:236:81: E501 Line too long (104 > 80)
# See https://github.com/vllm-project/vllm/pull/10785/files#r1864278303
matcher.fill_next_token_bitmask(self.token_bitmask, i)

# token_bitmask is a CPU tensor for use with accept_token and
Expand Down

0 comments on commit 4ee464a

Please sign in to comment.