forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MQ engine: remove guided decoding init from the client
Currently with MQLLMEngine, we are initializing LogitsProcessors on the client side, pickling the entire list of LogitsProcessors, and sending them over ZeroMQ to the engine. This was put in place so that the expensive initialization (tens of second) of the Outlines LogitsProcessor could happen in a thread, such that the client could defer submitting the request to the engine until the initialization had completed. This became an issue because recent (Rust-based) Outlines does not support pickle serialization, but this has resolved by dottxt-ai/outlines-core#99. However, this approach is also not desirable in the case of XGrammar because the initialization is not expensive (hundreds of milliseconds) and the serialization is just unnecessary complexity. And so, let's remove the code from the client side of MQLLMEngine to special case the creation of logits_processors based on guided decoding params. This will now happen on the engine side once again. Signed-off-by: Mark McLoughlin <[email protected]>
- Loading branch information
Showing
2 changed files
with
5 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters