Skip to content

Commit

Permalink
fixed accuracy bug
Browse files Browse the repository at this point in the history
  • Loading branch information
robertgshaw2-redhat committed Nov 17, 2024
1 parent 1af03e0 commit 02ee304
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vllm/v1/worker/tpu_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ def _prepare_decode_inputs(self, num_decodes: int) -> DecodeInputData:
slot_mapping = block_number * self.block_size + block_offsets
# Set an out of range value for the padding tokens so that they
# are ignored when inserting into the KV cache.
slot_mapping[-num_decodes:] = _PAD_SLOT_ID
slot_mapping[num_decodes:] = _PAD_SLOT_ID
slot_mapping = slot_mapping[:padded_batch_size]

# BLOCK_TABLE [batch, max_num_blocks_per_req]
Expand Down

0 comments on commit 02ee304

Please sign in to comment.