Skip to content

Commit

Permalink
a bit faster
Browse files Browse the repository at this point in the history
  • Loading branch information
robertgshaw2-neuralmagic committed Nov 17, 2024
1 parent 75c44b4 commit 58e85eb
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions vllm/v1/worker/tpu_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,7 @@ def execute_model(
)

# NOTE: TPU<>CPU sync happens here.
# It is important to call .cpu() first to avoid compilation on hotpath.

Check failure on line 379 in vllm/v1/worker/tpu_model_runner.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/v1/worker/tpu_model_runner.py:379:81: E501 Line too long (83 > 80)
token_ids = selected_token_ids.cpu()[:num_decodes]
sampled_token_ids_list = token_ids.tolist()
sampled_token_ids[:num_decodes] = token_ids
Expand Down Expand Up @@ -407,6 +408,7 @@ def execute_model(
is_prompt=True
)
# NOTE: TPU<>CPU sync happens here.
# It is important to call .cpu() first to avoid compilation on hotpath.

Check failure on line 411 in vllm/v1/worker/tpu_model_runner.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/v1/worker/tpu_model_runner.py:411:81: E501 Line too long (83 > 80)
token_id = selected_token_ids.cpu()[prompt_len - 1].item()
sampled_token_ids[num_decodes + idx] = token_id
req_state = self.requests[req_id]
Expand Down

0 comments on commit 58e85eb

Please sign in to comment.