Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loop.pre_setup to allow fine-grained LitAPI validation based on inference loop #393

Merged
merged 4 commits into from
Dec 11, 2024

Conversation

aniketmaurya
Copy link
Collaborator

@aniketmaurya aniketmaurya commented Dec 11, 2024

What does this PR do?

This PR allows fine-grained validation of LitAPI based on inference loop.

Default loops - will have the same validation as of now but new inference loops such as continuous batching will be able to add more validations such as check lit_api.predict_start method exists and won't force a global structure of validation.

For example, the following example will fail with the current global validation and forces to implement predict as generator:

from vllm import LLMEngine, EngineArgs, SamplingParams
from litserve.loops import ContinuousBatchingLoop, Output
from litserve.utils import LitAPIStatus
import litserve as ls

class LitVLLM(ls.LitAPI):
    def setup(self, device):
        engine_args = EngineArgs(model="meta-llama/Llama-3.2-1B", device=device, max_model_len=2048)
        self.engine = LLMEngine.from_engine_args(engine_args)

    def decode_request(self, request: dict) -> dict:
        return request

class vLLMLoop(ContinuousBatchingLoop):
    def __init__(self, max_sequence_length: int = 2048):
        super().__init__(max_sequence_length)
        self.uids = {}  # Maps vLLM request_id (str) to original uid

    def add_request(self, uid: str, request, lit_api: LitVLLM, lit_spec) -> None:
        super().add_request(uid, request, lit_api, lit_spec)
        request_id = str(uid)
        sampling_params = SamplingParams(
            temperature=request.get("temperature", 0.0),
            max_tokens=request.get("max_tokens", 10)
        )
        lit_api.engine.add_request(
            request_id=request_id,
            prompt=request["prompt"],
            params=sampling_params
        )
        self.uids[request_id] = uid

    def step(self, prev_outputs, lit_api: vLLMAPI, lit_spec):
        request_outputs = lit_api.engine.step()
        outputs = []

        for request_output in request_outputs:
            if request_output.request_id not in self.uids:
                continue

            original_uid = self.uids[request_output.request_id]
            token_id = request_output.outputs[0].token_ids[-1]
            output = lit_api.engine.get_tokenizer().decode([token_id])
            outputs.append(Output(
                uid=original_uid,
                output=output,
                status=LitAPIStatus.OK
            ))

            if request_output.finished:
                outputs.append(Output(
                    uid=original_uid,
                    output="",
                    status=LitAPIStatus.FINISH_STREAMING
                ))
                del self.uids[request_output.request_id]

        return outputs

if __name__ == "__main__":
    loop = vLLMLoop()
    api = LitVLLM()
    server = ls.LitServer(api, loop=loop, max_batch_size=4, stream=True, timeout=-1)
    server.run()
Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@aniketmaurya aniketmaurya changed the title Sanitize LitAPI with loop.pre_setup to allow fine-grained LitAPI validation based on inference loop Add loop.pre_setup to allow fine-grained LitAPI validation based on inference loop Dec 11, 2024
Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 74.41860% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88%. Comparing base (75c6d0e) to head (75aac9a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #393   +/-   ##
===================================
  Coverage    88%    88%           
===================================
  Files        25     25           
  Lines      1756   1765    +9     
===================================
+ Hits       1548   1558   +10     
+ Misses      208    207    -1     

src/litserve/api.py Outdated Show resolved Hide resolved
src/litserve/loops.py Show resolved Hide resolved
src/litserve/loops.py Show resolved Hide resolved
src/litserve/loops.py Show resolved Hide resolved
@aniketmaurya aniketmaurya enabled auto-merge (squash) December 11, 2024 16:31
@aniketmaurya aniketmaurya merged commit 35129f7 into main Dec 11, 2024
21 checks passed
@aniketmaurya aniketmaurya deleted the aniket/pre-setup-loop branch December 11, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants